Multi-pair gigabit ethernet transceiver

ABSTRACT

Various systems and methods providing high speed decoding, enhanced power reduction and clock domain partitioning for a multi-pair gigabit Ethernet transceiver are disclosed. ISI compensation is partitioned into two stages; a first stage compensates ISI components induced by characteristics of a transmitter&#39;s partial response pulse shaping filter in a demodulator, a second stage compensates ISI components induced by characteristics of a multi-pair transmission channel in a Viterbi decoder. High speed decoding is accomplished by reducing the DFE depth by providing an input signal from a multiple decision feedback equalizer to the Viterbi based on a tail value and a subset of coefficient values received from a unit depth decision-feedback equalizer. Power reduction is accomplished by adaptively truncating active taps in the NEXT, FEXT and echo cancellation filters, or by disabling decoder circuitry portions, as channel response characteristics allow. A receive clock signal is generated such that it is synchronous in frequency with analog sampling clock signals and has a particular phase offset with respect to one of the sampling clock signals. This phase offset is adjusted such that system performance degradation due to coupling of switching noise from the digital sections to the analog sections is substantially minimized.

CROSS REFERENCE TO RELATED APPLICATIONS

[0001] The present application claims priority on the basis of thefollowing provisional applications: Serial No. 60/130,616 entitled“Multi-Pair Gigabit Ethernet Transceiver” filed on Apr. 22, 1999, SerialNo. 60/116,946 entitled “Multiple Decision Feedback Equalizer” filed onJan. 20, 1999, Serial No. 60/108,648 entitled “Clock Generation andDistribution in an Ethernet Transceiver” filed on Nov. 16, 1998, SerialNo. 60/108,319 entitled “Gigabit Ethernet Transceiver” filed on Nov. 13,1998, Serial No. 60/107,874 entitled “Apparatus for and Method ofDistributing Clock Signals in a Communication System” filed Nov. 9,1998, and Serial No. 60/107,880 entitled “Apparatus for and Method ofReducing Power Dissipation in a Communication System” filed Nov. 9,1998.

[0002] The present application is related to the following co-pendingapplications, commonly owned by the assignee of the present application,the entire contents of each of which are expressly incorporated hereinby reference: Ser. No. 09/370,370 entitled “System and Method forTrellis Decoding in a Multi-Pair Transceiver System”, Ser. No.09/370,353 entitled “Multi-Pair Transceiver Decoder System with LowComputation Slicer”, Ser. No. 09/370,354 entitled “System and Method forHigh Speed Decoding and ISI Compensation in a Multi-Pair TransceiverSystem” Ser. No. 09/370,491 entitled “High-Speed Decoder for Multi-PairGigabit Transceiver”, all filed Oct. 10, 1999, and Ser. No. 09/390,856entitled Dynamic regulation of Power Consumption in a High-SpeedCommunication System” filed Sep. 3, 1999.

[0003] The present application is also related to the followingco-pending applications, filed on instant date herewith and commonlyowned by the assignee of the present application, the entire contents ofeach of which are expressly incorporated herein by reference: Ser. No.09/437,721 entitled “Timing Recovery System for a Multi-Pair GigabitTransceiver” and Ser. No. 09/437,724 entitled “Switching Noise Reductionin a Multi-Clock Domain Transceiver”.

FIELD OF THE INVENTION

[0004] The present invention relates generally to high speed networkingtransceivers and, more particularly to gigabit Ethernet transceivershaving reduced power consumption, efficient clock domain partitioningand able to decode input symbols within a symbol period with a minimumof computational intensity.

DESCRIPTION OF THE RELATED ART

[0005] In recent years, local area network (LAN) applications havebecome more and more prevalent as a means for providing localinterconnect between personal computer systems, work stations andservers. Because of the breadth of its installed base, the 10BASE-Timplementation of Ethernet remains the most pervasive if not thedominant, network technology for LANs. However, as the need to exchangeinformation becomes more and more imperative, and as the scope and sizeof the information being exchanged increases, higher and higher speeds(greater bandwidth) are required from network interconnect technologies.Among the highspeed LAN technologies currently available, fast Ethernet,commonly termed 100BASE-T, has emerged as the clear technologicalchoice. Fast Ethernet technology provides a smooth, non-disruptiveevolution from the 10 megabit per second (Mbps) performance of 10BASE-Tapplications to the 100 Mbps performance of 100BASE-T. The growing useof 100BASE-T interconnections between servers and desktops is creating adefinite need for an even higher speed network technology at thebackbone and server level.

[0006] One of the more suitable solutions to this need has been proposedin the IEEE 802.3ab standard for gigabit Ethernet, also termed1000BASE-T. Gigabit Ethernet is defined as able to provide 1 gigabit persecond (Gbps) bandwidth in combination with the simplicity of anEthernet architecture, at a lower cost than other technologies ofcomparable speed. Moreover, gigabit Ethernet offers a smooth, seamlessupgrade path for present 10BASE-T or 100BASE-T Ethernet installations.

[0007] In order to obtain the requisite gigabit performance levels,gigabit Ethernet transceivers are interconnected with a multi-pairtransmission channel architecture. In particular, transceivers areinterconnected using four separate pairs of twisted Category-5 copperwires. Gigabit communication, in practice, involves the simultaneous,parallel transmission of information signals, with each signal conveyinginformation at a rate of 250 megabits per second (Mb/s). Simultaneous,parallel transmission of four information signals over four twisted wirepairs poses substantial challenges to bidirectional communicationtransceivers, even though the data rate on any one wire pair is “only”250 Mbps.

[0008] In particular, the gigabit Ethernet standard requires thatdigital information being processed for transmission be symbolicallyrepresented in accordance with a five-level pulse amplitude modulationscheme (PAM-5) and encoded in accordance with an 8-state Trellis codingmethodology. Coded information is then communicated over amulti-dimensional parallel transmission channel to a designatedreceiver, where the original information must be extracted (demodulated)from a multi-level signal. In gigabit Ethernet, it is important to notethat it is the concatenation of signal samples received simultaneouslyon all four twisted pair lines of the channel that defines a symbol.Thus, demodulator/decoder architectures must be implemented with adegree of computational complexity that allows them to accommodate notonly the “state width” of Trellis coded signals, but also the“dimensional depth” represented by the transmission channel.

[0009] Computational complexity is not the only challenge presented tomodern gigabit capable communication devices. A perhaps greaterchallenge is that the complex computations required to process “deep”and “wide” signal representations must be performed in an almostvanishingly small period of time. For example, in gigabit applications,each of the four-dimensional signal samples, formed by the four signalsreceived simultaneously over the four twisted wire pairs, must beefficiently decoded within a particular allocated symbol time window ofabout 8 nanoseconds.

[0010] Successfully accomplishing the multitude of sequential processingoperations required to decode gigabit signal samples within an 8nanosecond window requires that the switching capabilities of theintegrated circuit technology from which the transceiver is constructedbe pushed to almost its fundamental limits. If performed in conventionalfashion, sequential signal processing operations necessary for signaldecoding and demodulation would result in a propagation delay throughthe logic circuits that would exceed the clock period, rendering thetransceiver circuit non-functional. Fundamentally, then, the challengeimposed by timing constraints must be addressed if gigabit Ethernet isto retain its viability and achieve the same reputation for accurate androbust operation enjoyed by its 10BASE-T and 100BASE-T siblings.

[0011] In addition to the challenges imposed by decoding anddemodulating multilevel signal samples, transceiver systems must also beable to deal with intersymbol interference (ISI) introduced bytransmission channel artifacts as well as by modulation and pulseshaping components in the transmission path of a remote transceiversystem. During the demodulation and decoding process of Trellis codedinformation, ISI components are introduced by either means must also beconsidered and compensated, further expanding the computationalcomplexity and thus, system latency of the transceiver system. Without atransceiver system capable of efficient, high-speed signal decoding aswell as simultaneous ISI compensation, gigabit Ethernet would likely notremain a viable concept.

[0012] In a Gigabit Ethernet communication system that conforms to the1000BASE-T standard, gigabit transceivers are connected via Category 5twisted pairs of copper cables. Cable responses vary drastically amongdifferent cables. Thus, the computations, and hence power comsumption,required to compensate for noise (such as echo, near-end crosstalk,far-end crosstalk) will vary widely depending on the particular cablethat is used.

[0013] In integrated circuit technology, power consumption is generallyrecognized as being a function of the switching (clock) speed oftransistor elements making up the circuitry, as well as the number ofcomponent elements operating within a given time period. The moretransistor elements operating at one time, and the higher theoperational speed of the component circuitry, the higher the relativedegree of power consumption for that circuit. This is particularlyrelevant in the case of Gigabit Ethernet, since all computationalcircuits are clocked at 125 Mhz (corresponding to 250 Mbps per twistedpair of cable), and the processing requirements of such circuits requirerather large blocks of computational circuitry, particularly in thefilter elements. Power consumption figures in the range of from about4.5 Watts to about 6.0 Watts are not unreasonable when the speed andcomplexity of modern gigabit communication circuitry is considered.

[0014] Pertinent to an analysis of power consumption is the realizationthat power is dissipated, in integrated circuits, as heat. As powerconsumption increases, not only must the system be provided with a morerobust power supply, but also with enhanced heat dissipation schemes,such as heat sinks (dissipation fins coupled to the IC package), coolingfans, increased interior volume for enhanced air flow, and the like. Allof these dissipation schemes involve considerable additionalmanufacturing costs and an extended design cycle due to the need to planfor thermal considerations.

[0015] Prior high speed communication circuits have not adequatelyaddressed these thermal considerations, because of the primary necessityof accommodating high data rates with a sufficient level of signalquality. Prior devices have, in effect, “hard wired” their processingcapability, such that processing circuitry is always operative tomaximize signal quality, whether that degree of processing is requiredor not. Where channel quality is high, full-filter-tap signal processingmore often obeys the law of diminishing returns, with very smallincremental noise margin gains recovered from the use of additionallarge blocks of active filter circuitry.

[0016] This trade-off between power consumption and signal quality hasheretofore limited the options available to an integrated circuitcommunication system designer. If low power consumption is made a systemrequirement, the system typically exhibits poor noise margin orbit-error-rate performance. Conversely, if system performance is madethe primary requirement, power consumption must fall where it may withthe corresponding consequences to system cost and reliability.

[0017] Accordingly, there is a need for a high speed integrated circuitcommunication system design which is able to accomodate a wide varietyof worst-case channel (cable) responses, while adaptively evaluatingsignal quality metrics in order that processing circuitry might bedisabled, and power consumption might thereby be reduced, at any suchtime that the circuitry is not necessary to assure a given minimum levelof signal quality.

[0018] Such a system should be able to adaptively determine and achievethe highest level of signal quality consistent with a given maximumpower consumption specification. In addition, such a system should beable to adaptively determine and achieve the lowest level of powerconsumption consistent with a given minimum signal qualityspecification.

SUMMARY OF THE INVENTION

[0019] The present invention is a method and a system for providing aninput signal from a multiple decision feedback equalizer to a decoderbased on a tail value and a subset of coefficient values received from adecision-feedback equalizer. A set of pre-computed values based on thesubset of coefficient values is generated. Each of the pre-computedvalues is combined with the tail value to generate a tentative sample.One of the tentative samples is selected as the input signal to thedecoder.

[0020] In one aspect of the system, tentative samples are saturated andthen stored in a set of registers before being outputted to amultiplexer which selects one of the tentative samples as the inputsignal to the decoder. This operation of storing the tentative samplesin the registers before providing the tentative samples to themultiplexer facilitates high-speed operation by breaking up a criticalpath of computations into substantially balanced first and secondportions, the first portion including computations in thedecision-feedback equalizer and the multiple decision feedbackequalizer, the second portion including computations in the decoder.

[0021] The present invention can be directed to a system and method fordecoding and ISI compensating received signal samples, modulated fortransmission in accordance with a multi-level alphabet, and encoded inaccordance with a multi-state encoding scheme. Modulated and encodedsignal samples are received and decoded in an integrated circuitreceiver which includes a multi-state signal decoder. The multi-statesignal decoder includes a symbol decoder adapted to receive a set ofsignal samples representing multi-state signals and evaluate themulti-state signals in accordance with the multi-level modulationalphabet and the multi-state encoding scheme. The symbol decoder outputstentative decisions.

[0022] An ISI compensation circuit is configured to provide ISIcompensated signal samples to the symbol decoder. The ISI compensationcircuit is constructed of a single decision feedback equalizer, with thesingle decision feedback equalizer providing ISI compensated signalsamples to the symbol decoder based on tentative decisions outputted bythe symbol decoder.

[0023] In one aspect of the invention, a path memory module is coupledto the symbol decoder and receives decisions and error terms from thesymbol decoder. The path memory module includes a plurality ofsequential registers, with each corresponding to a respective one ofconsecutive time intervals. The registers store decisions correspondingto the respective ones of the states of the multi-state encoded signals.Decision circuitry selects a best decision from corresponding ones ofthe registers, with the best decision of a distal register defining afinal decision. The best decision of an intermediate register defines atentative decision which is output to the ISI compensation circuit.

[0024] The single decision feedback equalizer is configured as an FIRfilter, and is characterized by a multiplicity of coefficients,subdivided into a set of high-order coefficients and a set of low-ordercoefficients. Tentative decisions from the path memory module are forcedto the single decision feedback equalizer at various locations along thefilter delay line and are combined with the high-order coefficients inorder to define a partial ISI component. The partial ISI component isarithmetically combined with an input signal sample in order to generatea partially ISI compensated intermediate signal called tail signal.

[0025] Low-order coefficients from the single decision feedbackequalizer are directed to a convolution engine wherein they are combinedwith values representing the levels of a multi-level modulationalphabet. The convolution engine outputs a multiplicity of signals,representing the convolution results, each of which are arithmeticallycombined with the tail signal to define a set of ISI compensatedtentative signal samples.

[0026] In a particular aspect of the invention, the ISI compensatedtentative signal samples are saturated and then stored in a set ofregisters before being outputted to a multiplexer circuit which selectsone of the tentative signal samples as the input signal to the symboldecoder. Storing tentative signal samples in the set of registers beforeproviding the tentative signal samples to the multiplexer, facilitateshigh-speed operation by breaking up a critical path of computations intosubstantially balanced first and second portions, the first portionincluding computation in the ISI compensation circuitry, including thesingle decision feedback equalizer and the multiple decision feedbackequalizer, the second portion including computations in the symboldecoder.

[0027] In a further aspect of the present invention, symbol decodercircuitry is implemented as a Viterbi decoder, the Viterbi decodercomputing path metrics for each of the N states of a Trellis code, andoutputting decisions based on the path metrics. A path memory module iscoupled to the Viterbi decoder for receiving decisions. The path memorymodule is implemented with a number of depth levels corresponding toconsecutive time intervals. Each of the depth levels includes Nregisters for storing decisions corresponding to the N states of thetrellis code. Each of the depth levels further includes a multiplexerfor selecting a best decision from the corresponding N registers, thebest decision at the last depth level defining the final decision, thebest decisions at other selected depth levels defining tentativedecisions.

[0028] In a particular aspect of the invention, tentative decisions aregenerated from the first three depth levels of the path memory module.These tentative decisions are forced to a single decision feedbackequalizer to generate a partial ISI component based on the first threetentative decisions and a set of high-order coefficients. The partialISI component is arithmetically combined with an input signal sample inorder to define a partially ISI compensated tentative signal sample.

[0029] The first two coefficients of the single decision feedbackequalizer are linearly combined with values representing the five levelsof a PAM-5 symbol alphabet, thereby generating a set of 25 pre-computedvalues, each of which are arithmetically combined with the partial ISIcompensated signal sample to develop a set of 25 samples, one of whichis a fully ISI compensated signal sample and is chosen as the input tothe symbol decoder.

[0030] The present invention is further directed to a system and methodfor decoding information signals modulated in accordance with amulti-level modulation scheme and encoded in accordance with amulti-state encoding scheme by computing a distance between a receivedword from a codeword included in a plurality of code-subsets. Codewordsare formed from a concatenation of symbols from a multi-level alphabet,with the symbols selected from two disjoint symbol-subsets X and Y. Areceived word is represented by L inputs, with L representing the numberof dimensions of a multi-dimensional communication channel. Each of theL inputs uniquely corresponds to one of the L dimensions.

[0031] A set of 1-dimensional (1D) errors is produced from the L inputs,with each of the 1D errors representing a distance metric between arespective one of the L inputs and a symbol in one of the two disjointsymbol-subsets. 1D errors are combined in order to produce a set ofL-dimensional errors such that each of the L-dimensional errorsrepresents a distance between the received word and a nearest codewordin one of the code-subsets.

[0032] In one embodiment of the invention, each of the L inputs issliced with respect to each of the two disjoint symbol-subsets X and Yin order to produce a set of X-based errors, a set of Y-based errors andcorresponding sets of X-based and Y-based decisions. The sets of X-basedand Y-based errors form the set of 1D errors, while the sets of X-basedand Y-based decisions form a set of 1D decisions. Each of the X-basedand Y-based decisions corresponds to a symbol, in a corresponding symbolsubset, closest in distance (value) to one of the L inputs. Each of the1D errors represents a distance metric between a corresponding 1Ddecision and the respective one of the L inputs.

[0033] In another embodiment of the invention, each of the L inputs aresliced with respect to each of the two disjoint symbol subsets X and Yin order to produce a set of 1D decisions. Each of the L inputs isfurther sliced with respect to a symbol-set including all of the symbolsof the two disjoint symbol-subsets in order to produce a set of harddecisions. The X-based and Y-based 1D decisions are combined with a setof hard decisions in order to produce a set of 1D errors, with each ofthe 1D errors representing a distance metric between a corresponding 1Ddecision and a respective one of the L inputs.

[0034] In one embodiment of the present invention, 1-dimensional errorsare combined in a first set of adders in order to produce a set of2-dimensional errors. A second set of adders combines the 2-dimensionalerrors in order to produce intermediate L-dimensional errors, with theintermediate L-dimensional errors being arranged into pairs of errorssuch that the pairs of errors correspond one-to-one to the code-subsets.A minimum-select module determines a minimum for each of the pairs oferrors. Once determined, the minima are defined as the L-dimensionalerrors.

[0035] The present invention is further directed to a method fordynamically regulating the power consumption of a high-speed integratedcircuit which includes a multiplicity of processing blocks. A firstmetric and a second metric, which are respectively related to a firstperformance parameter and a second performance parameter of theintegrated circuit, are defined. The first metric is set at apre-defined value. Selected blocks of the multiplicity of processingblocks are disabled in accordance with a set of pre-determined patterns.The second metric is evaluated, while the disabling operation is beingperformed, to generate a range of values of the second metric. Each ofthe values corresponds to the pre-defined value of the first metric. Amost desirable value of the second metric is determined from the rangeof values and is matched to a corresponding pre-determined pattern. Theintegrated circuit is subsequently operated with selected processingblocks disabled in accordance with the matching pre-determined pattern.

[0036] In particular, the first and second performance parameters aredistinct and are chosen from the parametric group consisting of powerconsumption and a signal quality figure of merit. The signal qualityfigure of merit is evaluated while selected blocks of the multiplicityof processing blocks are disabled. The set of selected blocks which givethe lowest power consumption, when disabled, while at the same timemaintaining an acceptable signal quality figure of merit at apre-defined threshold level is maintained in a disabled condition whilethe integrated circuit is subsequently operated.

[0037] In one aspect of the present invention, reduced power dissipationis chosen as the most desirable metric to evaluate, while a signalquality figure of merit is accorded secondary consideration.Alternatively, a signal quality figure of merit is chosen as the mostdesirable metric to evaluate, while power dissipation is accorded asecondary consideration. In a further aspect of the present invention,both signal quality and power dissipation are accorded equalconsideration with selective blocks of the multiplicity of processingblocks being disabled and the resultant signal quality and powerdissipation figures of merit being evaluated so as to define aco-existing local maxima of signal quality with a local minima of powerdissipation.

[0038] In one particular embodiment, the present invention may becharacterized as a method for dynamically regulating the powerconsumption of a communication system which includes at least a firstmodule. The first module can be any circuit block, not necessarily asignal processing block. Power regulation proceeds by specifying a powerdissipation value and an error value. An information error metric and apower metric is computed. Activation and deactivation of at least aportion of the first module of the communication system is controlledaccording to a particular criterion. The criterion is based on at leastone of the information error metric, the power metric, the specifiederror and the specified power, to regulate at least one of theinformation metric and the power metric.

[0039] In particular, at least a portion of the first module isactivated if the information error metric is greater than the specifiederror and the first module portion is deactivated if the informationerror metric is less than the specified error. In an additional aspectof the invention, the first module portion is activated if theinformation error metric is greater than the specified error and thepower metric is smaller than the specified power. The first moduleportion is deactivated if the information error metric is smaller thanthe specified error or the power metric is greater than the specifiedpower. In yet a further aspect of the invention, the first moduleportion is activated if the information error metric is greater than thespecified error and is deactivated if the information error metric issmaller than a target value, the target value being smaller than thespecified error. In yet another aspect of the invention, the firstmodule portion is activated if the information error metric is greaterthan the specified error and the power metric is smaller than thespecified power. The first module portion is deactivated if theinformation error metric is smaller than a target value, the targetvalue being smaller than the specified error, or the power metric isgreater than the specified power.

[0040] Advantageously, the information error metric is related to a biterror rate of the communication system and the information error metricis a measure of performance degradation in the communication systemcaused by deactivation of the portion of the first module. Where themodule is a filter which includes a set of taps, with each of the tapsincluding a filter coefficient, the information error metric is ameasure of performance degradation of a transceiver caused by operationof the filter.

[0041] Power dissipation reduction is implemented by deactivatingsubsets of taps which make up the filter, until such time as performancedegradation caused by the truncated filter reaches a pre-determinedthreshold level.

[0042] The present invention further provides a method for reducingsystem performance degradation caused by switching noise in a systemwhich includes a set of subsystems. Each of the subsystems includes ananalog section and a digital section. Each of the analog sectionsoperates in accordance with a corresponding one of a set of samplingclock signals which are synchronous in frequency. The digital sectionsoperate in accordance with a receive clock signal. The receive clocksignal is generated such that it is synchronous in frequency with thesampling clock signals and has a phase offset with respect to one of thesampling clock signals. This phase offset is adjusted such that systemperformance degradation due to coupling of switching noise from thedigital sections to the analog sections is substantially minimized.

BRIEF DESCRIPTION OF THE DRAWINGS

[0043] These and other features, aspects and advantages of the presentinvention will be more fully understood when considered with respect tothe following detailed description, appended claims and accompanyingdrawings, wherein:

[0044]FIG. 1 is a simplified, semi-schematic block diagram of ahigh-speed bidirectional communication system exemplified by twotransceivers configured to communicate over multiple twisted-pair wiringchannels.

[0045]FIG. 2 is a simplified, semi-schematic block diagram of abidirectional communication transceiver system, constructed inaccordance with the present invention.

[0046]FIG. 3 is a simplified, semi-schematic block diagram of anexemplary trellis decoder, including a Viterbi decoder, in accordancewith the invention, suitable for decoding signals coded by the exemplarytrellis encoder of FIG. 6.

[0047]FIG. 4A illustrates an exemplary PAM-5 constellation and theone-dimensional symbol-subset partitioning.

[0048]FIG. 4B illustrates the eight 4D code-subsets constructed from theone-dimensional symbol-subset partitioning of the constellation of FIG.4A.

[0049]FIG. 5 illustrates the trellis diagram for the code.

[0050]FIG. 6 is a simplified, semi-schematic block diagram of anexemplary trellis encoder.

[0051]FIG. 7 is a simplified block diagram of a first exemplaryembodiment of a structural analog of a 1D slicing function as might beimplemented in the Viterbi decoder of FIG. 3.

[0052]FIG. 8 is a simplified block diagram of a second exemplaryembodiment of a structural analog of a 1D slicing function as might beimplemented in the Viterbi decoder of FIG. 3.

[0053]FIG. 9 is a simplified block diagram of a 2D error term generationmachine, illustrating the generation of 2D square error terms from the1D square error terms developed by the exemplary slicers of FIG. 7 or 8.

[0054]FIG. 10 is a simplified block diagram of a 4D error termgeneration machine, illustrating the generation of 4D square error termsand the generation of extended path metrics for the 4 extended pathsoutgoing from state 0.

[0055]FIG. 11 is a simplified block diagram of a 4D symbol generationmachine.

[0056]FIG. 12 illustrates the selection of the-best path incoming tostate 0.

[0057]FIG. 13 is a semi-schematic block diagram illustrating theinternal arrangement of a portion of the path memory module of FIG. 3.

[0058]FIG. 14 is a block diagram illustrating the computation of thefinal decision and the tentative decisions in the path memory modulebased on the 4D symbols stored in the path memory for each state.

[0059]FIG. 15 is a detailed diagram illustrating the processing of theoutputs V₀ ^((i)), V₁ ⁽¹⁾, with i=0, . . . , 7, and V_(0F), V_(1F),V_(2F) of the path memory module of FIG. 3.

[0060]FIG. 16 shows the word lengths used in one embodiment of thisinvention.

[0061]FIG. 17 shows an exemplary lookup table suitable for use incomputing squared one-dimensional error terms.

[0062]FIGS. 18A and 18B are an exemplary look-up table which describesthe computation of the decisions and squared errors for both the X and Ysubsets directly from one component of the 4D Viterbi input of the 1Dslicers of FIG. 7.

[0063]FIG. 19 illustrates the general clocking relationship between thetransmitter and the receiver inside each of the four constituenttransceivers 108 of the gigabit Ethernet transceiver (101 or 102) ofFIG. 1;

[0064]FIG. 20 is a simplified block diagram of an embodiment of thetiming recovery system constructed according to the present invention;

[0065]FIG. 21 is a block diagram of an exemplary implementation of thesystem of FIG. 20;

[0066]FIG. 22 is a block diagram of an exemplary embodiment of the phasereset logic block used for resetting the register of the NCO of FIG. 21to a specified value;

[0067]FIG. 23 is a block diagram of an exemplary phase shifter logicblock used for the phase control of the receive clock signal RCLK;

[0068]FIG. 24 is a flowchart of an embodiment of the process foradjusting the phase of the receive clock signal RCLK;

[0069]FIG. 25A is a first example of clock distribution where thetransitions of the four sampling clock signals ACLK0-3 are evenlydistributed within the symbol period.

[0070]FIG. 25B is a second example of clock distribution where thetransitions of the four sampling clock signals ACLK0-3 are distributedwithin the symbol period of 8 nanoseconds (ns) such that each ACLK clocktransition is 1 ns apart from an adjacent ACLK clock transition.

[0071]FIG. 25C is a third example of clock distribution where thetransitions of the four sampling clock signals ACLK0-3 occur at the sameinstant within the symbol period.

[0072]FIG. 26 is a flowchart of an embodiment of the process foradjusting the phase of a sampling clock signal ACLKx associated with oneof the constituent transceivers;

[0073]FIG. 27 is a block diagram of an embodiment of the MSE computationblock used for computing the mean squared error of a constituenttransceiver.

[0074]FIG. 28 is a simplified matrix diagram illustrating therelationship between power consumption and a performance metric;

[0075]FIG. 29A is a simplified structure diagram of an adaptive FIRfilter as might be implemented as an echo/NEXT canceller circuit in oneembodiment of a transceiver in accordance with the present invention;

[0076]FIG. 29B is an equivalent structure of the adaptive FIR filtershown in FIG. 29A;

[0077]FIG. 29C is a simplified structure diagram of an alternativeadaptive FIR filter including a modification to the structure of FIG.29B to bypass a deactivated tap;

[0078]FIG. 29D is a simplified block diagram of a deactivate-ablecoefficient multiplier circuit such as might be implemented in thefilters of FIGS. 29A, 29B and 29C;

[0079]FIG. 30 is a flowchart depicting a first exemplary embodiment ofan adaptive power reduction method according to the present invention;

[0080]FIG. 31 is a flowchart depicting one exemplary embodiment of anactivation block according to the method of FIG. 30;

[0081]FIG. 32 is a flowchart depicting one exemplary embodiment of adeactivation block according to the method of FIG. 30;

[0082]FIG. 33 is a flowchart of one embodiment of the computing block514 of FIG. 30;

[0083]FIG. 34 is a flowchart depicting one exemplary embodiment of apower-down block according to the method of FIG. 30;

[0084]FIG. 35 is a graph of an exemplary impulse response of the echocharacteristics of a typical channel;

[0085]FIG. 36 is a graph of an exemplary impulse response of thenear-end crosstalk (NEXT) characteristics of a typical channel;

[0086]FIGS. 37A and 37B are graphs of the mean squared error to signalratio (MSE/signal) expressed in dB as a function of time, with timeexpressed in bauds, of exemplary Master and Slave transceivers,respectively;

[0087]FIGS. 38A and 38B are graphs of the values of the tap coefficientsof an exemplary echo canceller as a function of the tap number, afterapplication of the tap power regulating process with the specified errorset at −24 dB and −26 dB, respectively;

[0088]FIG. 39 is a block diagram of an exemplary trellis decoder asapplied to a case in which there is substantially no intersymbolinterference;

[0089]FIG. 40 is a simplified block diagram of an alternative embodimentof the invention in which power consumption is reduced by substitutionof a symbol-by-symbol decoder in place of a Viterbi decoder;

DETAILED DESCRIPTION OF THE INVENTION

[0090] In the context of an exemplary integrated circuit-typebidirectional communication system, the present invention might becharacterized as a system and method for accommodating efficient, highspeed decoding of signal samples encoded according to the trellis codespecified in the IEEE 802.3ab standard (also termed 1000BASE-Tstandard).

[0091] As will be understood by one having skill in the art, high speeddata transmission is often limited by the ability of decoder systems toquickly, accurately and effectively process a transmitted symbol withina given time period. In a 1000BASE-T application (aptly termed gigabit)for example, the symbol decode period is typically taken to beapproximately 8 nanoseconds. Pertinent to any discussion of symboldecoding is the realization that 1000BASE-T systems are layered toreceive 4-dimensional (4D) signals (each signal corresponding to arespective one of four twisted pair cables) with each of the4-dimensional signals represented by five analog levels. Accordingly,the decoder circuitry portions of transceiver demodulation blocksrequire a multiplicity of operational steps to be taken in order toeffectively decode each symbol. Such a multiplicity of operations iscomputationally complex and often pushes the switching speeds ofintegrated circuit transistors which make up the computational blocks totheir fundamental limits.

[0092] In accordance with the present invention, a transceiver decoderis able to substantially reduce the computational complexity of symboldecoding, and thus avoid substantial amounts of propagation delay (i.e.,increase operational speed), by making use of truncated (or partial)representations of various quantities that make up the decoding/ISIcompensation process.

[0093] Sample slicing is performed in a manner such that one-dimensional(1D) square error terms are developed in a representation having, atmost, three bits if the terms signify a Euclidian distance, and one bitif the terms signify a Hamming distance. Truncated 1D error termrepresentation significantly reduces subsequent error processingcomplexity because of the fewer number of bits.

[0094] Likewise, ISI compensation of sample signals, prior to Viterbidecoding, is performed in a DFE, operatively responsive to tentativedecisions made by the Viterbi. Use of tentative decisions, instead of aViterbi's final decision, reduces system latency by a factor directlyrelated to the path memory sequence distance between the tentativedecision used, and the final decision, i.e., if there are N steps in thepath memory from input to final decision output, and latency is afunction of N, forcing the DFE with a tentative decision at step N−6causes latency to become a function of N−6. A trade-off between latencyreduction and accuracy may be made by choosing a tentative decision stepeither closer to the final decision point or closer to the initialpoint.

[0095] Computations associated with removing impairments due tointersymbol interference (ISI) are substantially simplified, inaccordance with the present invention, by a combination of techniquesthat involves the recognition that intersymbol interference results fromtwo primary causes, a partial response pulse shaping filter in atransmitter and from the characteristics of a unshielded twisted pairtransmission channel. During the initial start-up, ISI impairments areprocessed in independent portions of electronic circuitry, with ISIcaused by a partial response pulse shaping filter being compensated inan inverse partial response filter in a feedforward equalizer (FFE) atsystem startup, and ISI caused by transmission channel characteristicscompensated by a decision feedback equalizer (DFE) operating inconjunction with a multiple decision feedback equalizer (MDFE) stage toprovide ISI pre-compensated signals (representing a symbol) to a decoderstage for symbolic decode. Performing the computations necessary for ISIcancellation in a bifurcated manner allows for fast DFE convergence aswell as assists a transceiver in achieving fast acquisition in a robustand reliable manner. After the start-up, all ISI is compensated by thecombination of the DFE and MDFE.

[0096] In order to appreciate the advantages of the present invention,it will be beneficial to describe the invention in the context of anexemplary bidirectional communication device, such as a gigabit Ethernettransceiver. The particular exemplary implementation chosen is depictedin FIG. 1, which is a simplified block diagram of a multi-paircommunication system operating in conformance with the IEEE 802.3abstandard for one gigabit (Gb/s) Ethernet full-duplex communication overfour twisted pairs of Category-5 copper wires.

[0097] The communication system illustrated in FIG. 1 is represented asa point-to-point system, in order to simplify the explanation, andincludes two main transceiver blocks 102 and 104, coupled together withfour twisted-pair cables. Each of the wire pairs 112 a, b, c, d iscoupled between the transceiver blocks through a respective one of fourline interface circuits 106 and communicate information developed byrespective ones of four transmitter/receiver circuits (constituenttransceivers) 108 coupled between respective interface circuits and aphysical coding sublayer (PCS) block 110. Four constituent transceivers108 are capable of operating simultaneously at 250 megabits per second(Mb/s), and are coupled through respective interface circuits tofacilitate full-duplex bidirectional operation. Thus, one Gb/scommunication throughput of each of the transceiver blocks 102 and 104is achieved by using four 250 Mb/s (125 Megabaud at 2 bits per symbol)constituent transceivers 108 for each of the transceiver blocks and fourtwisted pairs of copper cables to connect the two transceivers together.

[0098]FIG. 2 is a simplified block diagram of the functionalarchitecture and internal construction of an exemplary transceiverblock, indicated generally at 200, such as transceiver 102 of FIG. 1.Since the illustrated transceiver application relates to gigabitEthernet transmission, the transceiver will be referred to as the“gigabit transceiver”. For ease of illustration and description, FIG. 2shows only one of the four 250 Mb/s constituent transceivers which areoperating simultaneously (termed herein 4-D operation). However, sincethe operation of the four constituent transceivers are necessarilyinterrelated, certain blocks in the signal lines in the exemplaryembodiment of FIG. 2 perform and carry 4-dimensional (4-D) functions and4-D signals, respectively. By 4-D, it is meant that the data from thefour constituent transceivers are used simultaneously. In order toclarify signal relationships in FIG. 2, thin lines correspond to1-dimensional functions or signals (i.e., relating to only a singletransceiver), and thick lines correspond to 4-D functions or signals(relating to all four transceivers).

[0099] With reference to FIG. 2, the gigabit transceiver 200 includes aGigabit Medium Independent Interface (GMII) block 202, a Physical CodingSublayer (PCS) block 204, a pulse shaping filter 206, adigital-to-analog (D/A) converter 208, a line interface block 210, ahighpass filter 212, a programmable gain amplifier (PGA) 214, ananalog-to-digital (A/D) converter 216, an automatic gain control block220, a timing recovery block 222, a pair-swap multiplexer block 224, ademodulator 226, an offset canceler 228, a near-end crosstalk (NEXT)canceler block 230 having three NEXT cancelers, and an echo canceler232. The gigabit transceiver 200 also includes an A/D first-in-first-outbuffer (FIFO) 218 to facilitate proper transfer of data from the analogclock region to the receive clock region, and a FIFO block 234 tofacilitate proper transfer of data from the transmit clock region to thereceive clock region. The gigabit transceiver 200 can optionally includea filter to cancel far-end crosstalk noise (FEXT canceler).

[0100] On the transmit path, the transmit section of the GMII block 202receives data from a Media Access Control (MAC) module (not shown inFIG. 2) and passes the digital data to the transmit section 204T of thePCS block 204 via a FIFO 201 in byte-wide format at the rate of 125 MHz.The FIFO 201 is essentially a synchronization buffer device and isprovided to ensure proper data transfer from the MAC layer to thePhysical Coding (PHY) layer, since the transmit clock of the PHY layeris not necessarily synchronized with the clock of the MAC layer. Thissmall FIFO 201 can be constructed with from three to five memory cellsto accommodate the elasticity requirement which is a function of framesize and frequency offset.

[0101] The transmit section 204T of the PCS block 204 performsscrambling and coding of the data and other control functions. Transmitsection 204T of the PCS block 204 generates four 1D symbols, one foreach of the four constituent transceivers. The 1D symbol generated forthe constituent transceiver depicted in FIG. 2 is filtered by a partialresponse pulse shaping filter 206 so that the radiated emission of theoutput of the transceiver may fall within the EMI requirements of theFederal Communications Commission. The pulse shaping filter 206 isconstructed with a transfer function 0.75+0.25z⁻¹, such that the powerspectrum of the output of the transceiver falls below the power spectrumof a 100Base-Tx signal. The 100Base-Tx is a widely used and acceptedFast Ethernet standard for 100 Mb/s operation on two pairs of category-5twisted pair cables. The output of the pulse shaping filter 206 isconverted to an analog signal by the D/A converter 208 operating at 125MHz. The analog signal passes through the line interface block 210, andis placed on the corresponding twisted pair cable for communication to aremote receiver.

[0102] On the receive path, the line interface block 210 receives ananalog signal from the twisted pair cable. The received analog signal ispreconditioned by a highpass filter 212 and a programmable gainamplifier (PGA) 214 before being converted to a digital signal by theA/D converter 216 operating at a sampling rate of 125 MHz. Sample timingof the A/D converter 216 is controlled by the output of a timingrecovery block 222 controlled, in turn, by decision and error signalsfrom a demodulator 226. The resulting digital signal is properlytransferred from the analog clock region to the receive clock region byan A/D FIFO 218, an output of which is also used by an automatic gaincontrol circuit 220 to control the operation of the PGA 214.

[0103] The output of the A/D FIFO 218, along with the outputs from theA/D FIFOs of the other three constituent transceivers are inputted to apair-swap multiplexer block 224. The pair-swap multiplexer block 224 isoperatively responsive to a 4D pair-swap control signal, asserted by thereceive section 204R of PCS block 204, to sort out the 4 input signalsand send the correct signals to the respective demodulators of the 4constituent transceivers. Since the coding scheme used for the gigabittransceivers 102, 104 (referring to FIG. 1) is based on the fact thateach twisted pair of wire corresponds to a 1D constellation, and thatthe four twisted pairs, collectively, form a 4D constellation, forsymbol decoding to function properly, each of the four twisted pairsmust be uniquely identified with one of the four dimensions. Anyundetected swapping of the four pairs would necessarily result inerroneous decoding. Although described as performed by the receivesection 204R of PCS block 204 and the pair-swap multiplexer block 224,in the exemplary embodiment of FIG. 2, the pair-swapping control mightalternatively be performed by the demodulator 226.

[0104] Demodulator 226 receives the particular received signal 2intended for it from the pair-swap multiplexer block 224, and functionsto demodulate and decode the signal prior to directing the decodedsymbols to the PCS layer 204 for transfer to the MAC. The demodulator226 includes a multi-component feedforward equalizer (FFE) 26, havingits output coupled to a de-skew memory circuit 36 and a trellis decoder38. The FFE 26 is multi-component in the sense that it includes a pulseshaping filter 28, a programmable inverse partial response (IPR) filter30, a summing device 32, and an adaptive gain stage 34. Functionally,the FFE 26 might be characterized as a least-mean-squares (LMS) typeadaptive filter which performs channel equalization as described in thefollowing.

[0105] Pulse shaping filter 28 is coupled to receive an input signal 2from the pair swap MUX 224 and functions to generate a precursor to theinput signal 2. Used for timing recovery, the precursor might be aptlydescribed as a zero-crossing inserted at a precursor position of thesignal. Such a zero-crossing assists a timing recovery circuit indetermining phase relationships between signals, by giving the timingrecovery circuit an accurately determinable signal transition point foruse as a reference. The pulse shaping filter 28 can be placed anywherebefore the decoder block 38. In the exemplary embodiment of FIG. 2, thepulse shaping filter 28 is positioned at the input of the FFE 26.

[0106] The pulse shaping filter 28 transfer function may be representedby a function of the form −γ+z⁻¹, with γ equal to {fraction (1/16)} forshort cables (less than 80 meters) and ⅛ for long cables (more than 80m). The determination of the length of a cable is based on the gain ofthe coarse PGA section 14 of the PGA 214.

[0107] A programmable inverse partial response (IPR) filter 30 iscoupled to receive the output of the pulse shaping filter 28, andfunctions to compensate the ISI introduced by the partial response pulseshaping in the transmitter section of the remote transceiver whichtransmitted the analog equivalent of the digital signal 2. The IPRfilter 30 transfer function may be represented by a function of the form1/(1+Kz⁻¹) and may also be described as dynamic. In particular, thefilter's K value is dynamically varied from an initial non-zero setting,valid at system start-up, to a final setting. K may take any positivevalue strictly less than 1. In the illustrated embodiment, K might takeon a value of about 0.484375 during startup, and be dynamically rampeddown to zero after convergence of the decision feedback equalizerincluded inside the trellis decoder 38.

[0108] The foregoing is particularly advantageous in high-speed datarecovery systems, since by compensating the transmitter induced ISI atstart-up, prior to decoding, it reduces the amount of processingrequired by the decoder to that required only for compensatingtransmission channel induced ISI. This “bifurcated” or divided ISIcompensation process allows for fast acquisition in a robust andreliable manner. After DFE convergence, noise enhancement in thefeedforward equalizer 26 is avoided by dynamically ramping the feedbackgain factor K of the IPR filter 30 to zero, effectively removing thefilter from the active computational path.

[0109] A summing device 32 subtracts from the output of the IPR filter30 the signals received from the offset canceler 228, the NEXT cancelers230, and the echo canceler 232. The offset canceler 228 is an adaptivefilter which generates an estimate of the offset introduced at theanalog front end which includes the PGA 214 and the A/D converter 216.Likewise, the three NEXT cancelers 230 are adaptive filters used formodeling the NEXT impairments in the received signal caused by thesymbols sent by the three local transmitters of the other threeconstituent transceivers. The impairments are due to a near-endcrosstalk mechanism between the pairs of cables. Since each receiver hasaccess to the data transmitted by the other three local transmitters, itis possible to nearly replicate the NEXT impairments through filtering.Referring to FIG. 2, the three NEXT cancelers 230 filter the signalssent by the PCS block 204 to the other three local transmitters andproduce three signals replicating the respective NEXT impairments. Bysubtracting these three signals from the output of the IPR filter 30,the NEXT impairments are approximately canceled.

[0110] Due to the bi-directional nature of the channel, each localtransmitter causes an echo impairment on the received signal of thelocal receiver with which it is paired to form a constituenttransceiver. The echo canceler 232 is an adaptive filter used formodeling the echo impairment. The echo canceler 232 filters the signalsent by the PCS block 204 to the local transmitter associated with thereceiver, and produces a replica of the echo impairment. By subtractingthis replica signal from the output of the IPR filter 30, the echoimpairment is approximately canceled.

[0111] Following NEXT, echo and offset cancellation, the signal iscoupled to an adaptive gain stage 34 which functions to fine tune thegain of the signal path using a zero-forcing LMS algorithm. Since thisadaptive gain stage 34 trains on the basis of errors of the adaptiveoffset, NEXT and echo cancellation filters 228, 230 and 232respectively, it provides a more accurate signal gain than the PGA 214.

[0112] The output of the adaptive gain stage 34, which is also theoutput of the FFE 26, is inputted to a de-skew memory 36. The de-skewmemory 36 is a four-dimensional function block, i.e., it also receivesthe outputs of the three FFEs of the other three constituenttransceivers as well as the output of FFE 26 illustrated in FIG. 2.There may be a relative skew in the outputs of the 4 FFEs, which are the4 signal samples representing the 4 symbols to be decoded. This relativeskew can be up to 50 nanoseconds, and is due to the variations in theway the copper wire pairs are twisted. In order to correctly decode thefour symbols, the four signal samples must be properly aligned. Thede-skew memory is responsive to a 4D de-skew control signal asserted bythe PCS block 204 to de-skew and align the four signal samples receivedfrom the four FFEs. The four de-skewed signal samples are then directedto the trellis decoder 38 for decoding.

[0113] Data received at the local transceiver was encoded, prior totransmission by a remote transceiver, using an 8-state four-dimensionaltrellis code. In the absence of inter-symbol interference (ISI), aproper 8-state Viterbi decoder would provide optimal decoding of thiscode. However, in the case of Gigabit Ethernet, the Category-5 twistedpair cable introduces a significant amount of ISI. In addition, as wasdescribed above in connection with the FFE stage 26, the partialresponse filter of the remote transmitter on the other end of thecommunication channel also contributes a certain component of ISI.Therefore, during nominal operation, the trellis decoder 38 must decodeboth the trellis code and compensate for at least transmission channelinduced ISI, at a substantially high computational rate, correspondingto a symbol rate of about 125 Mhz.

[0114] In the illustrated embodiment of the gigabit transceiver of FIG.2, the trellis decoder 38 suitably includes an 8-state Viterbi decoderfor symbol decoding, and incorporates circuitry which implements adecision-feedback sequence estimation approach in order to compensatethe ISI components perturbing the signal which represents transmittedsymbols. The 4D output 40 of the trellis decoder 38 is provided to thereceive section 204R of the PCS block. The receive section 204R of PCSblock de-scrambles and further decodes the symbol stream and then passesthe decoded packets and idle stream to the receive section of the GMIIblock 202 for transfer to the MAC module.

[0115] The 4D outputs 42 and 44, which represent the error and tentativedecision signals defined by the decoder, respectively, are provided tothe timing recovery block 222, whose output controls the sampling timeof the A/D converter 216. One of the four components of the error 42 andone of the four components of the tentative decision 44 correspond tothe signal stream pertinent to the particular receiver section,illustrated in FIG. 2, and are provided to the adaptive gain stage 34 toadjust the gain of the signal path.

[0116] The component 42A of the 4D error 42, which corresponds to thereceiver shown in FIG. 2, is further provided to the adaptationcircuitry of each of the adaptive offset, NEXT and echo cancellationfilters 228, 230, 232. Adaptation circuitry evaluates the content of theerror component and, initially, adapts the filter's training process todevelop suitable filter coefficient values. During nominal operation,adaptation circuitry monitors the error component and provides periodicupdates to the filter coefficients in response thereto.

[0117] As implemented in the exemplary Ethernet gigabit transceiver, thetrellis decoder 38 functions to decode symbols that have been encoded inaccordance with the trellis code specified in the IEEE 802.3ab standard(1000BASE-T, or gigabit). As mentioned above, information signals arecommunicated between transceivers at a symbol rate of about 125 Mhz, oneach of the pairs of twisted copper cables that make up the transmissionchannel. In accordance with established Ethernet communicationprotocols, information signals are modulated for transmission inaccordance with a 5-level Pulse Amplitude Modulation (PAM-5) modulationscheme. Thus, since information signals are represented by fiveamplitude levels, it will be understood that symbols can be expressed ina three bit representation on each twisted wire pair.

[0118] Turning now to FIGS. 4A and 4B, an exemplary PAM-5 constellationis depicted in FIG. 4A which also depicts the one-dimensional symbolsubset partitioning within the constellation. As illustrated in FIG. 4A,the constellation is a representation of five amplitude levels, +2, +1,0, −1, −2, in decreasing order. Symbol subset partitioning occurs bydividing the five levels into two 1D subsets, X and Y, and assigning Xand Y subset designations to the five levels on an alternating basis.Thus +2, 0 and −2 are assigned to the Y subset; +1 and −1 are assignedto the X subset. The partitioning could, of course, be reversed, with +1and −1 being assigned a Y designation.

[0119] It should be recognized that although the X and Y subsetsrepresent different absolute amplitude levels, the vector distancebetween neighboring amplitudes within the subsets are the same, i.e.,two (2). The X subset therefore includes amplitude level designationswhich differ by a value of two, (−1, +1), as does the Y subset (−2, 0,+2). This partitioning offers certain advantages to slicer circuitry ina decoder, as will be developed further below.

[0120] In FIG. 4B, the 1D subsets have been combined into 4D subsetsrepresenting the four twisted pairs of the transmission channel. Since1D subset definition is binary (X:Y) and there are four wire pairs,there are sixteen possible combinations of 4D subsets. These sixteenpossible combinations are assigned into eight 4D subsets, s0 to s7inclusive, in accordance with a trellis coding scheme. Each of the 4Dsubsets (also termed code subsets) are constructed of a union of twocomplementary 4D sub-subsets, e.g., code-subset three (identified as s3)is the union of sub-subset X:X:Y:X and its complementary image Y:Y:X:Y.

[0121] Data being processed for transmission is encoded using the abovedescribed 4-dimensional (4D) 8-state trellis code, in an encodercircuit, such as illustrated in the exemplary block diagram of FIG. 6,according to an encoding algorithm specified in the 1000BASE-T standard.Referring to FIG. 6, an exemplary encoder 300, which is commonlyprovided in the transmit PCS portion of a gigabit transceiver, might berepresented in simplified form as a convolutional encoder 302 incombination with a signal mapper 304. Data received by the transmit PCSfrom the MAC module via the transmit gigabit medium independentinterface are encoded with control data and scrambled, resulting in aneight bit data word represented by input bits D₀ through D₇ which areintroduced to the signal mapper 304 of the encoder 300 at a data rate ofabout 125 MHz. The two least significant bits, D₀ and D₁, are alsoinputted, in parallel fashion, into a convolutional encoder 302,implemented as a linear feedback shift register, in order to generate aredundancy bit C which is a necessary condition for the provision of thecoding gain of the code.

[0122] As described above, the convolutional encoder 302 is a linearfeedback shift register, constructed of three delay elements 303, 304and 305 (conventionally denoted by z⁻¹) interspersed with and separatedby two summing circuits 307 and 308 which function to combine the twoleast significant bits (LSBs), D₀ and D₁, of the input word with theoutput of the first and second delay elements, 303 and 304 respectively.The two time sequences formed by the streams of the two LSBs areconvolved with the coefficients of the linear feedback shift register toproduce the time sequence of the redundancy bit C. Thus, theconvolutional encoder might be viewed as a state machine.

[0123] The signal mapper 304 maps the 9 bits (D₀-D₇ and C) into aparticular 4-dimensional constellation point. Each of the fourdimensions uniquely corresponds to one of the four twisted wire pairs.In each dimension, the possible symbols are from the symbol set {−2, −1,0, +1, +2}. The symbol set is partitioned into two disjoint symbolsubsets X and Y, with X={−1, +1} and Y={−2, 0, +2}, as described aboveand shown in FIG. 4A.

[0124] Referring to FIG. 4B, the eight code subsets s0 through s7 definethe constellation of the code in the signal space. Each of the codesubsets is formed by the union of two code sub-subsets, each of the codesub-subsets being formed by 4D patterns obtained from concatenation ofsymbols taken from the symbol subsets X and Y. For example, the codesubset s0 is formed by the union of the 4D patterns from the 4D codesub-subsets XXXX and YYYY. It should be noted that the distance betweenany two arbitrary even (respectively, odd) code-subsets is {squareroot}{square root over (2)}. It should be further noted that each of thecode subsets is able to define at least 72 constellation points.However, only 64 constellation points in each code subset are recognizedas codewords of the trellis code specified in the 1000BASE-T standard.

[0125] This reduced constellation is termed the pruned constellation.Hereinafter, the term “codeword” is used to indicate a 4D symbol thatbelongs to the pruned constellation. A valid codeword is part of a validpath in the trellis diagram.

[0126] Referring now to FIG. 6 and with reference to FIGS. 4A and 4B, inoperation, the signal mapper 304 uses the 3 bits D₁, D₀ and C to selectone of the code subsets s0-s7, and uses the 6 MSB bits of the inputsignal, D₂-D₇ to select one of 64 particular points in the selected codesubset. These 64 particular points of the selected coded subsetcorrespond to codewords of the trellis code. The signal mapper 304outputs the selected 4D constellation point 306 which will be placed onthe four twisted wire pairs after pulse shape filtering anddigital-to-analog conversion.

[0127]FIG. 5 shows the trellis diagram for the trellis code specified inthe 1000BASE-T standard. In the trellis diagram, each vertical column ofnodes represents the possible states that the encoder 300 (FIG. 6) canassume at a point in time. It is noted that the states of the encoder300 are dictated by the states of the convolutional encoder 302 (FIG.6). Since the convolutional encoder 302 has three delay elements, thereare eight distinct states. Successive columns of nodes represent thepossible states that might be defined by the convolutional encoder statemachine at successive points in time.

[0128] Referring to FIG. 5, the eight distinct states of the encoder 300are identified by numerals 0 through 7, inclusive. From any givencurrent state, each subsequent transmitted 4D symbol must correspond toa transition of the encoder 300 from the given state to a permissiblesuccessor state. For example, from the current state 0 (respectively,from current states 2, 4, 6), a transmitted 4D symbol taken from thecode subset s0 corresponds to a transition to the successor state 0(respectively, to successor states 1, 2 or 3). Similarly, from currentstate 0, a transmitted 4D symbol taken from code subset s2(respectively, code subsets s4, s6) corresponds to a transition tosuccessor state 1 (respectively, successor states 2, 3).

[0129] Familiarity with the trellis diagram of FIG. 5, illustrates thatfrom any even state (i.e., states 0, 2, 4 or 6), valid transitions canonly be made to certain ones of the successor states, i.e., states 0, 1,2 or 3. From any odd state (states 1, 3, 5 or 7), valid transitions canonly be made to the remaining successor states, i.e., states 4, 5, 6 or7. Each transition in the trellis diagram, also called a branch, may bethought of as being characterized by the predecessor state (the state itleaves), the successor state (the state it enters) and the correspondingtransmitted 4D symbol. A valid sequence of states is represented by apath through the trellis which follows the above noted rules. A validsequence of states corresponds to a valid sequence of transmitted 4Dsymbols.

[0130] At the receiving end of the communication channel, the trellisdecoder 38 uses the methodology represented by the trellis diagram ofFIG. 5 to decode a sequence of received signal samples into theirsymbolic representation, in accordance with the well known Viterbialgorithm. A traditional Viterbi decoder processes information signalsiteratively, on an information frame by information frame basis (in theGigabit Ethernet case, each information frame is a 4D received signalsample corresponding to a 4D symbol), tracing through a trellis diagramcorresponding to the one used by the encoder, in an attempt to emulatethe encoder's behavior. At any particular frame time, the decoder is notinstantaneously aware of which node (or state) the encoder has reached,thus, it does not try to decode the node at that particular frame time.Instead, given the received sequence of signal samples, the decodercalculates the most likely path to every node and determines thedistance between each of such paths and the received sequence in orderto determine a quantity called the path metric.

[0131] In the next frame time, the decoder determines the most likelypath to each of the new nodes of that frame time. To get to any one ofthe new nodes, a path must pass through one of the old nodes. Possiblepaths to each new node are obtained by extending to this new node eachof the old paths that are allowed to be thus extended, as specified bythe trellis diagram. In the trellis diagram of FIG. 5, there are fourpossible paths to each new node. For each new node, the extended pathwith the smallest path metric is selected as the most likely path tothis new node.

[0132] By continuing the above path-extending process, the decoderdetermines a set of surviving paths to the set of nodes at the nth frametime. If all of the paths pass through the same node at the first frametime, then the traditional decoder knows which most likely node theencoder entered at the first frame time, regardless of which node theencoder entered at the nth frame time. In other words, the decoder knowshow to decode the received information associated with the first frametime, even though it has not yet made a decision for the receivedinformation associated with the nth frame time. At the nth frame time,the traditional decoder examines all surviving paths to see if they passthrough the same first branch in the first frame time. If they do, thenthe valid symbol associated with this first branch is outputted by thedecoder as the decoded information frame for the first frame time. Then,the decoder drops the first frame and takes in a new frame for the nextiteration. Again, if all surviving paths pass through the same node ofthe oldest surviving frame, then this information frame is decoded. Thedecoder continues this frame-by-frame decoding process indefinitely solong as information is received.

[0133] The number of symbols that the decoder can store is called thedecoding-window width. The decoder must have a decoding window widthlarge enough to ensure that a well-defined decision will almost alwaysbe made at a frame time. As discussed later in connection with FIGS. 13and 14, the decoding window width of the trellis decoder 38 of FIG. 2 is10 symbols. This length of the decoding window is selected based onresults of computer simulation of the trellis decoder 38.

[0134] A decoding failure occurs when not all of the surviving paths tothe set of nodes at frame time n pass through a common first branch atframe time 0. In such a case, the traditional decoder would defer makinga decision and would continue tracing deeper in the trellis. This wouldcause unacceptable latency for a high-speed system such as the gigabitEthernet transceiver. Unlike the traditional decoder, the trellisdecoder 38 of the present invention does not check whether the survivingpaths pass through a common first branch. Rather, the trellis decoder,in accordance with the invention, makes an assumption that the survivingpaths at frame time n pass through such a branch, and outputs a decisionfor frame time 0 on the basis of that assumption. If this decision isincorrect, the trellis decoder 38 will necessarily output a fewadditional incorrect decisions based on the initial perturbation, butwill soon recover due to the nature of the particular relationshipbetween the code and the characteristics of the transmission channel. Itshould, further, be noted that this potential error introduction sourceis relatively trivial in actual practice, since the assumption made bythe trellis decoder 38 that all the surviving paths at frame time n passthrough a common first branch at frame time 0 is a correct one to a veryhigh statistical probability.

[0135]FIG. 3 is a simplified block diagram of the construction detailsof an exemplary trellis decoder such as described in connection withFIG. 2. The exemplary trellis decoder (again indicated generally at 38)is constructed to include a multiple decision feedback equalizer (MDFE)602, Viterbi decoder circuitry 604, a path metrics module 606, a pathmemory module 608, a select logic 610, and a decision feedback equalizer612. In general, a Viterbi decoder is often thought of as including thepath metrics module and the path memory module. However, because of theunique arrangement and functional operation of the elements of theexemplary trellis decoder 38, the functional element which performs theslicing operation will be referred to herein as Viterbi decodercircuitry, a Viterbi decoder, or colloquially a Viterbi.

[0136] The Viterbi decoder circuitry 604 performs 4D slicing of signalsreceived at the Viterbi inputs 614, and computes the branch metrics. Abranch metric, as the term is used herein, is well known and refers toan elemental path between neighboring Trellis nodes. A plurality ofbranch metrics will thus be understood to make up a path metric. Anextended path metric will be understood to refer to a path metric, whichis extended by a next branch metric to thereby form an extension to thepath. Based on the branch metrics and the previous path metricsinformation 618 received from the path metrics module 606, the Viterbidecoder 604 extends the paths and computes the extended path metrics 620which are returned to the path metrics module 606. The Viterbi decoder604 selects the best path incoming to each of the eight states, updatesthe path memory stored in the path memory module 608 and the pathmetrics stored in the path metrics module 606.

[0137] In the traditional Viterbi decoding algorithm, the inputs to adecoder are the same for all the states of the code. Thus, a traditionalViterbi decoder would have only one 4D input for a 4D 8-state code. Incontrast, and in accordance with the present invention, the inputs 614to the Viterbi decoder 604 are different for each of the eight states.This is the result of the fact the Viterbi inputs 614 are defined byfeedback signals generated by the MDFE 602 and are different for each ofthe eight paths (one path per state) of the Viterbi decoder 604, as willbe discussed later.

[0138] There are eight Viterbi inputs 614 and eight Viterbi decisions616, each corresponding to a respective one of the eight states of thecode. Each of the eight Viterbi inputs 614, and each of the decisionoutputs 618, is a 4-dimensional vector whose four components are theViterbi inputs and decision outputs for the four constituenttransceivers, respectively. In other words, the four components of eachof the eight Viterbi inputs 614 are associated with the four pairs ofthe Category-5 cable. The four components are a received word thatcorresponds to a valid codeword. From the foregoing, it should beunderstood that detection (decoding, demodulation, and the like) ofinformation signals in a gigabit system is inherently computationallyintensive. When it is further realized that received information must bedetected at a very high speed and in the presence of ISI channelimpairments, the difficulty in achieving robust and reliable signaldetection will become apparent.

[0139] In accordance with the present invention, the Viterbi decoder 604detects a non-binary word by first producing a set of one-dimensional(1D) decisions and a corresponding set of 1D errors from the 4D inputs.By combining the 1D decisions with the 1D errors, the decoder produces aset of 4D decisions and a corresponding set of 4D errors. Hereinafter,this generation of 4D decisions and errors from the 4D inputs isreferred to as 4D slicing. Each of the 1D errors represents the-distancemetric between one 1D component of the eight 4D-inputs and a symbol inone of the two disjoint symbol-subsets X, Y. Each of the 4D errors isthe distance between the received word and the corresponding 4D decisionwhich is a codeword nearest to the received word with respect to one ofthe code-subsets si, where i=0, . . . 7.

[0140] 4D errors may also be characterized as the branch metrics in theViterbi algorithm. The branch metrics are added to the previous valuesof path metrics 618 received from the path metrics module 606 to formthe extended path metrics 620 which are then stored in the path metricsmodule 606, replacing the previous path metrics. For any one given stateof the eight states of the code, there are four incoming paths. For agiven state, the Viterbi decoder 604 selects the best path, i.e., thepath having the lowest metric of the four paths incoming to that state,and discards the other three paths. The best path is saved in the pathmemory module 608. The metric associated with the best path is stored inthe path metrics module 606, replacing the previous value of the pathmetric stored in that module.

[0141] In the following, the 4D slicing function of the Viterbi decoder604 will be described in detail. 4D slicing may be described as beingperformed in three sequential steps. In a first step, a set of 1Ddecisions and corresponding 1D errors are generated from the 4D Viterbiinputs. Next, the 1D decisions and 1D errors are combined to form a setof 2D decisions and corresponding 2D errors. Finally, the 2D decisionsand 2D errors are combined to form 4D decisions and corresponding 4Derrors.

[0142]FIG. 7 is a simplified, conceptual block diagram of a firstexemplary embodiment of a 1D slicing function such as might beimplemented by the Viterbi decoder 604 of FIG. 3. Referring to FIG. 7, a1D component 702 of the eight 4D Viterbi inputs (614 of FIG. 3) issliced, i.e., detected, in parallel fashion, by a pair of 1D slicers 704and '706 with respect to the X and Y symbol-subsets. Each slicer 704 and706 outputs a respective 1D decision 708 and 710 with respect to theappropriate respective symbol-subset X, Y and an associated squarederror value 712 and 714. Each 1D decision 708 or 710 is the symbol whichis closest to the 1D input 702 in the appropriate symbol-subset X and Y,respectively. The squared error values 712 and 714 each represent thesquare of the difference between the 1D input 702 and their respective1D decisions 708 and 710.

[0143] The 1D slicing function shown in FIG. 7 is performed for all fourconstituent transceivers and for all eight states of the trellis code inorder to produce one pair of 1D decisions per transceiver and per state.Thus, the Viterbi decoder 604 has a total of 32 pairs of 1D slicersdisposed in a manner identical to the pair of slicers 704, 706illustrated in FIG. 7.

[0144]FIG. 8 is a simplified block diagram of a second exemplaryembodiment of circuitry capable of implementing a 1D slicing functionsuitable for incorporation in the Viterbi decoder 604 of FIG. 5.Referring to FIG. 8, the 1D component 702 of the eight 4D Viterbi inputsis sliced, i.e., detected, by a first pair of 1D slicers 704 and 706,with respect to the X and Y symbol-subsets, and also by a 5-level slicer805 with respect to the symbol set which represents the five levels (+2,+1, 0, −1, −2) of the constellation, i.e., a union of the X and Ysymbol-subsets. As in the previous case described in connection withFIG. 7, the slicers 704 and 706 output 1D decisions 708 and 710. The 1Ddecision 708 is the symbol which is nearest the 1D input 702 in thesymbol-subset X, while 1D decision 710 corresponds to the symbol whichis nearest the 1D input 702 in the symbol-subset Y. The output 807 ofthe 5-level slicer 805 corresponds to the particular one of the fiveconstellation symbols which is determined to be closest to the 1D input702.

[0145] The difference between each decision 708 and 710 and the 5-levelslicer output 807 is processed, in a manner to be described in greaterdetail below, to generate respective quasi-squared error terms 812 and814. In contrast to the 1D error terms 712, 714 obtained with the firstexemplary embodiment of a 1D slicer depicted in FIG. 7, the 1D errorterms 812, 814 generated by the exemplary embodiment of FIG. 8 are moreeasily adapted to discerning relative differences between a 1D decisionand a 1D Viterbi input.

[0146] In particular, the slicer embodiment of FIG. 7 may be viewed asperforming a “soft decode”, with 1D error terms-712 and 714 representedby Euclidian metrics. The slicer embodiment depicted in FIG. 8 may beviewed as performing a “hard decode”, with its respective 1D error terms812 and 814 expressed in Hamming metrics (i.e., 1 or 0). Thus, there isless ambiguity as to whether the 1D Viterbi input is closer to the Xsymbol subset or to the Y symbol subset. Furthermore, Hamming metricscan be expressed in a fewer number of bits, than Euclidian metrics,resulting in a system that is substantially less computationally complexand substantially faster.

[0147] In the exemplary embodiment of FIG. 8, error terms are generatedby combining the output of the five level slicer 805 with the outputs ofthe 1D slicers 704 and 706 in respective adder circuits 809A and 809B.The outputs of the adders are directed to respective squared magnitudeblocks 811A and 811B which generate the binary squared error terms 812and 814, respectively.

[0148] Implementation of squared error terms by use of circuit elementssuch as adders 809A, 809B and the magnitude squared blocks 811A, 811B isdone for descriptive convenience and conceptual illustration purposesonly. In practice, squared error term definition is implemented with alook-up table that contains possible values for error-X and error-Y fora given set of decision-X, decision-Y and Viterbi input values. Thelook-up table can be implemented with a read-only-memory device oralternatively, a random logic device or PLA. Examples of look-up tables,suitable for use in practice of the present invention, are illustratedin FIGS. 17, 18A and 18B.

[0149] The 1D slicing function exemplified in FIG. 8 is performed forall four constituent transceivers and for all eight states of thetrellis code in order to produce one pair of 1D decisions pertransceiver and per state. Thus, the Viterbi decoder 604 has a total ofthirty two pairs of 1D slicers that correspond to the pair of slicers704, 706, and thirty two 5-level slicers that correspond to the 5-levelslicer 805 of FIG. 8.

[0150] Each of the 1D errors is represented by substantially fewer bitsthan each 1D component of the 4D inputs. For example, in the embodimentof FIG. 7, the 1D component of the 4D Viterbi input is represented by 5bits, while the 1D error is represented by 2 or 3 bits. Traditionally,proper soft decision decoding of such a trellis code would require thatthe distance metric (Euclidean distance) be represented by 6 to 8 bits.One advantageous feature of the present invention is that only 2 or 3bits are required for the distance metric in soft decision decoding ofthis trellis code.

[0151] In the embodiment of FIG. 8, the 1D error can be represented byjust 1 bit. It is noted that, since the 1D error is represented by 1bit, the distance metric used in this trellis decoding is no longer theEuclidean distance, which is usually associated with trellis decoding,but is instead the Hamming distance, which is usually associated withhard decision decoding of binary codewords. This is another particularlyadvantageous feature of the present invention.

[0152]FIG. 9 is a block diagram illustrating the generation of the 2Derrors from the 1D errors for twisted pairs A and B (corresponding toconstituent transceivers A and B). Since the generation of errors issimilar for twisted pairs C and D, this discussion will only concernitself with the A:B 2D case. It will be understood that the discussionis equally applicable to the C:D 2D case with the appropriate change innotation. Referring to FIG. 9, 1D error signals 712A, 712B, 714A, 714Bmight be produced by the exemplary 1D slicing functional blocks shown inFIG. 7 or 8. The 1D error term signal 712A (or respectively, 712B) isobtained by slicing, with respect to symbol-subset X, the 1D componentof the 4D Viterbi input, which corresponds to pair A (or respectively,pair B). The 1D error term 714A (respectively, 714B) is obtained byslicing, with respect to symbol-subset Y, the 1D component of the 4DViterbi input, which corresponds to pair A (respectively, B). The 1Derrors 712A, 712B, 714A, 714B are added according to all possiblecombinations (XX, XY, YX and YY) to produce 2D error terms 902AB, 904AB,906AB, 908AB for pairs A and B. Similarly, the 1D errors 712C, 712D,714C, 714D (not shown) are added according to the four differentsymbol-subset combinations XX, XY, YX and YY) to produce corresponding2D error terms for wire pairs C and D.

[0153]FIG. 10 is a block diagram illustrating the generation of the 4Derrors and extended path metrics for the four extended paths outgoingfrom state 0. Referring to FIG. 10, the 2D errors 902AB, 902CD, 904AB,904CD, 906AB, 906CD, 908AB, 908CD are added in pairs according to eightdifferent combinations to produce eight intermediate 4D errors 1002,1004, 1006, 1008, 1010, 1012, 1014, 1016. For example, the 2D error902AB, which is the squared error with respect to XX from pairs A and B,are added to the 2D error 902CD, which is the squared error with respectto XX from pairs C and D, to form the intermediate 4D error 1002 whichis the squared error with respect to sub-subset XXXX for pairs A, B, Cand D. Similarly, the intermediate 4D error 1004 which corresponds tothe squared error with respect to sub-subset YYYY is formed from the 2Derrors 908AB and 908CD.

[0154] The eight intermediate 4D errors are grouped in pairs tocorrespond to the code subsets s0, s2, s4 and s6 represented in FIG. 4B.For example, the intermediate 4D errors 1002 and 1004 are groupedtogether to correspond to the code subset s0 which is formed by theunion of the XXXX and YYYY sub-subsets. From each pair of intermediate4D errors, the one with the lowest value is selected (the other onebeing discarded) in order to provide the branch metric of a transitionin the trellis diagram from state 0 to a subsequent state. It is notedthat, according to the trellis diagram, transitions from an even state(i.e., 0, 2, 4 and 6) are only allowed to be to the states 0, 1, 2 and3, and transitions from an odd state (i.e., 1, 3, 5 and 7) are onlyallowed to be to the states 4, 5, 6 and 7. Each of the index signals1026, 1028, 1030, 1032 indicates which of the 2 sub-subsets the selectedintermediate 4D error corresponds to. The branch metrics 1018, 1020,1022, 1024 are the branch metrics for the transitions in the trellisdiagram of FIG. 5 associated with code-subsets s0, s2, s4 and s6respectively, from state 0 to states 0, 1, 2 and 3, respectively. Thebranch metrics are added to the previous path metric 1000 for state 0 inorder to produce the extended path metrics 1034, 1036, 1038, 1040 of thefour extended paths outgoing from state 0 to states 0, 1, 2 and 3,respectively.

[0155] Associated with the eight intermediate 4D errors 1002, 1004,1006, 1008, 1010, 1012, 1014, 1016 are the 4D decisions which are formedfrom the 1D decisions made by one of the exemplary slicer embodiments ofFIG. 7 or 8. Associated with the branch metrics 1018, 1020, 1022, 1024are the 4D symbols derived by selecting the 4D decisions using the indexoutputs 1026, 1028, 1030, 1032.

[0156]FIG. 11 shows the generation of the 4D symbols associated with thebranch metrics 1018, 1020, 1022, 1024. Referring to FIG. 11, the 1Ddecisions 708A, 708B, 708C, 708D are the 1D decisions with respect tosymbol-subset X (as shown in FIG. 7) for constituent transceivers A, B,C, D, respectively, and the 1D decisions 710A, 710, 710C, 710D are the1D decisions with respect to symbol-subset Y for constituenttransceivers A, B, C and D, respectively. The 1D decisions areconcatenated according to the combinations which correspond to a left orright hand portion of the code subsets s0, s2, s4 and s6, as depicted inFIG. 4B. For example, the 1D decisions 708A, 708B, 708C, 708D areconcatenated to correspond to the left hand portion, XXXX, of the codesubset s0. The 4D decisions are grouped in pairs to correspond to theunion of symbol-subset portions making up the code subsets s0, s2, s4and s6. In particular, the 4D decisions are grouped together tocorrespond to the code subset s0 which is formed by the union of theXXXX and YYYY subset portions.

[0157] Referring to FIG. 11, the pairs of 4D decisions are inputted tothe multiplexers 1120, 1122, 1124, 1126 which receive the index signals1026, 1028, 1030, 1032 (FIG. 10) as select signals. Each of themultiplexers selects from a pair of the 4D decisions, the 4D decisionwhich corresponds to the sub-subset indicated by the corresponding indexsignal and outputs the selected 4D decision as the 4D symbol for thebranch whose branch metric is associated with the index signal. The 4Dsymbols 1130, 1132, 1134, 1136 correspond to the transitions in thetrellis diagram of FIG. 5 associated with code-subsets s0, s2, s4 and s6respectively, from state 0 to states 0, 1, 2 and 3, respectively. Eachof the 4D symbols 1130, 1132, 1134, 1136 is the codeword in thecorresponding code-subset (s0, s2, s4 and s6) which is closest to the 4DViterbi input for state 0 (there is a 4D Viterbi input for each state).The associated branch metric (FIG. 10) is the 4D squared distancebetween the codeword and the 4D Viterbi input for state 0.

[0158]FIG. 12 illustrates the selection of the best path incoming tostate 0. The extended path metrics of the four paths incoming to state 0from states 0, 2, 4 and 6 are inputted to the comparator module 1202which selects the best path, i.e., the path with the lowest path metric,and outputs the Path 0 Select signal 1206 as an indicator of this pathselection, and the associated path metric 1204.

[0159] The procedure described above for processing a 4D Viterbi inputfor state 0 of the code to obtain four branch metrics, four extendedpath metrics, and four corresponding 4D symbols is similar for the otherstates. For each of the other states, the selection of the best pathfrom the four incoming paths to that state is also similar to theprocedure described in connection with FIG. 12.

[0160] The above discussion of the computation of the branch metrics,illustrated by FIGS. 7 through 11, is an exemplary application of themethod for slicing (detecting) a received L-dimensional word and forcomputing the distance of the received L-dimensional word from acodeword, for the particular case where L is equal to 4.

[0161] In general terms, i.e., for any value of L greater than 2, themethod can be described as follows. The codewords of the trellis codeare constellation points chosen from 2^(L−1) code-subsets. A codeword isa concatenation of L symbols selected from two disjoint symbol-subsetsand is a constellation point belonging to one of the 2^(L−1)code-subsets. At the receiver, L inputs are received, each of the Linputs uniquely corresponding to one of the L dimensions. The receivedword is formed by the L inputs. To detect the received word, 2^(L−1)identical input sets are formed by assigning the same L inputs to eachof the 2^(L−1) input sets. Each of the L inputs of each of the 2^(L−1)input sets is sliced with respect to each of the two disjointsymbol-subsets to produce an error set of 2L one-dimensional errors foreach of the 2^(L−1) code-subsets. For the particular case of the trelliscode of the type described by the trellis diagram of FIG. 5, theone-dimensional errors are combined within each of the 2^(L−1) errorsets to produce 2^(L−2) L-dimensional errors for the correspondingcode-subset such that each of the 2^(L−1) L-dimensional errors is adistance between the received word and one of the codewords in thecorresponding code-subset.

[0162] One embodiment of this combining operation can be described asfollows. First, the 2L one-dimensional errors are combined to produce 2Ltwo-dimensional errors (FIG. 9). Then, the 2L two-dimensional errors arecombined to produce 2^(L) intermediate L-dimensional errors which arearranged into 2^(L−1) pairs of errors such that these pairs of errorscorrespond one-to-one to the 2^(L−1) code-subsets (FIG. 10, signals 1002through 1016). A minimum is selected for each of the 2^(L−1) pairs oferrors (FIG. 10, signals 1026, 1028, 1030, 1032) These minima are the2^(L−1) L-dimensional errors. Due to the constraints on transitions fromone state to a successor state, as shown in the trellis diagram of FIG.5, only half of the 2^(L−1) L-dimensional errors correspond to allowedtransitions in the trellis diagram. These 2^(L−2) L-dimensional errorsare associated with 2^(L−2) L-dimensional decisions. Each of the 2^(L−2)L-dimensional decisions is a codeword closest in distance to thereceived word (the distance being represented by one of the 2^(L−2)L-dimensional errors), the codeword being in one of half of the 2^(L−1)code-subsets, i.e., in one of 2^(L−2) code-subsets of the 2^(L−1)code-subsets (due to the particular constraint of the trellis codedescribed by the trellis diagram of FIG. 5).

[0163] It is important to note that the details of the combiningoperation on the 2L one-dimensional errors to produce the finalL-dimensional errors and the number of the final L-dimensional errorsare functions of a particular trellis code. In other words, they varydepending on the particular trellis code.

[0164]FIG. 13 illustrates the construction of the path memory module 608as implemented in the embodiment of FIG. 6. The path memory module 608includes a path memory for each of the eight paths. In the illustratedembodiment of the invention, the path memory for each path isimplemented as a register stack, ten levels in depth. At each level, a4D symbol is stored in a register. The number of path memory levels ischosen as a tradeoff between receiver latency and detection accuracy.FIG. 13 only shows the path memory for path 0 and continues with theexample discussed in FIGS. 7-12. FIG. 13 illustrates how the 4D decisionfor the path 0 is stored in the path memory module 608, and how the Path0 Select signal, i.e., the information about which one of the fourincoming extended paths to state 0 was selected, is used in thecorresponding path memory to force merging of the paths at all depthlevels (levels 0 through 9) in the path memory.

[0165] Referring to FIG. 13, each of the ten levels of the path memoryincludes a 4-to-1 multiplexer (4:1 MUX) and a register to store a 4Ddecision. The registers are numbered according to their depth levels.For example, register 0 is at depth level 0. The Path 0 Select signal1206 (FIG. 12) is used as the select input for the 4:1 MUXes 1302, 1304,1306, . . . , 1320. The 4D decisions 1130, 1132, 1134, 1136 (FIG. 11)are inputted to the 4:1 MUX 1302 which selects one of the four 4Ddecisions based on the Path 0 select signal 1206 and stores it in theregister 0 of path 0. One symbol period later, the register 0 of path 0outputs the selected 4D decision to the 4:1 MUX 1304. The other three 4Ddecisions inputted to the 4:1 MUX 1304 are from the registers 0 of paths2, 4, and 6. Based on the Path 0 Select signal 1206, the 4:1 MUX 1304selects one of the four 4D decisions and stores it in the register 1 ofpath 0. One symbol period later, the register 1 of path 0 outputs theselected 4D decision to the 4:1 MUX 1306. The other three 4D decisionsinputted to the 4:1 MUX 1306 are from the registers 1 of paths 2, 4, and6. Based on the Path 0 Select signal 1206, the 4:1 MUX 1306 selects oneof the four 4D decisions and stores it in the register 2 of path 0. Thisprocedure continues for levels 3 through 9 of the path memory for path0. During continuous operation, ten 4D symbols representing path 0 arestored in registers 0 through 9 of the path memory for path 0.

[0166] Similarly to path 0, each of the paths 1 though 7 is stored asten 4D symbols in the registers of the corresponding path memory. Theconnections between the MUX of one path and registers of different pathsfollows the trellis diagram of FIG. 2. For example, the MUX at level kfor path 1 receives as inputs the outputs of the registers at level k−1for paths 1, 3, 5, 7, and the MUX at level k for path 2 receives asinputs the outputs of the registers at level k−1 for paths 0, 2, 4, 6.

[0167]FIG. 14 is a block diagram illustrating the computation of thefinal decision and the tentative decisions in the path memory module 608based on the 4D symbols stored in the path memory for each state. Ateach iteration of the Viterbi algorithm, the best of the eight states,i.e., the one associated with the path having the lowest path metric, isselected, and the 4D symbol from the associated path stored at the lastlevel of the path memory is selected as the final decision 40 (FIG. 3).Symbols at lower depth levels are selected as tentative decisions, whichare used to feed the delay line of the DFE 612 (FIG. 3).

[0168] Referring to FIG. 14, the path metrics 1402 of the eight states,obtained from the procedure of FIG. 12, are inputted to the comparatormodule 1406 which selects the one with the lowest value and provides anindicator 1401 of this selection to the select inputs of the 8-to-1multiplexers (8:1 MUXes) 1402, 1404, 1406, . . . , 1420, which arelocated at path memory depth levels 0 through 9, respectively. Each ofthe 8:1 MUXes receives eight 4D symbols outputted from correspondingregisters for the eight paths, the corresponding registers being locatedat the same depth level as the MUX, and selects one of the eight 4Dsymbols to output, based on the select signal 1401. The outputs of the8:1 MUXes located at depth levels 0 through 9 are V₀, V₁, V₂, . . . ,V₉, respectively.

[0169] In the illustrated embodiment, one set of eight signals, outputby the first register set (the register 0 set) to the first MUX 1402, isalso taken off as a set of eight outputs, denoted V₀ ^(i) and providedto the MDFE (602 of FIG. 3) as a select signal which is used in a mannerto be described below. Although only the first register set isillustrated as providing outputs to the DFE, the invention contemplatesthe second, or even higher order, register sets also providing similaroutputs. In cases where multiple register sets provide outputs, theseare identified by the register set depth order as a subscript, as in V₁^(i), and the like.

[0170] In the illustrated embodiment, the MUX outputs V₀, V₁, V₂ aredelayed by one unit of time, and are then provided as the tentativedecisions V_(0F), V_(1F), V_(2F) to the DFE 612. The number of theoutputs V_(i) to be used as tentative decisions depends on the requiredaccuracy and speed of decoding operation. After further delay, theoutput V₀ of the first MUX 1402 is also provided as the 4D tentativedecision 44 (FIG. 2) to the Feedforward Equalizers 26 of the fourconstituent transceivers and the timing recovery block 222 (FIG. 2). The4D symbol V_(9F), which is the output V₀ of the 8:1 MUX 1420 delayed byone time unit, is provided as the final decision 40 to the receivesection of the PCS 204R (FIG. 2).

[0171] The following is the discussion on how outputs V₀ ^(i), V₁ ^(i),V_(0F), V_(1F), V_(2F) of the path memory module 608 might be used inthe select logic 610, the MDFE 602, and the DFE 612 (FIG. 3).

[0172]FIG. 15 is a block level diagram of the ISI compensation portionof the decoder, including construction and operational details of theDFE and MDFE circuitry (612 and 602 of FIG. 3, respectively). The ISIcompensation embodiment depicted in FIG. 15 is adapted to receive signalsamples from the deskew memory (36 of FIG. 2) and provide ISIcompensated signal samples to the Viterbi (slicer) for decoding. Theembodiment illustrated in FIG. 15 includes the Viterbi block 1502 (whichincludes the Viterbi decoder 604, the path metrics module 606 and thepath memory module 608), the select logic 610, the MDFE 602 and the DFE612.

[0173] The MDFE 602 computes an independent feedback signal for each ofthe paths stored in the path memory module 608. These feedback signalsrepresent different hypotheses for the intersymbol interferencecomponent present in the input 37 (FIGS. 2 and 6) to the trellis decoder38. The different hypotheses for the intersymbol interference componentcorrespond to the different hypotheses about the previous symbols whichare represented by the different paths of the Viterbi decoder.

[0174] The Viterbi algorithm tests these hypotheses and identifies themost likely one. It is an essential aspect of the Viterbi algorithm topostpone this identifying decision until there is enough information tominimize the probability of error in the decision. In the meantime, allthe possibilities are kept open. Ideally, the MDFE block would use theentire path memory to compute the different feedback signals using theentire length of the path memory. In practice, this is not possiblebecause this would lead to unacceptable complexity. By “unacceptable”,it is meant requiring a very large number of components and an extremelycomplex interconnection pattern.

[0175] Therefore, in the exemplary embodiment, the part of the feedbacksignal computation that is performed on a per-path basis is limited tothe two most recent symbols stored in register set 0 and register set 1of all paths in the path memory module 608, namely V₀ ^(i) and V₁ ^(i)with i=0, . . . , 7, indicating the path. For symbols older than twoperiods, a hard decision is forced, and only one replica of a “tail”component of the intersymbol interference is computed. This results insome marginal loss of performance, but is more than adequatelycompensated for by a simpler system implementation.

[0176] The DFE 612 computes this “tail” component of the intersymbolinterference, based on the tentative decisions V_(0F), V_(1F), andV_(2F). The reason for using three different tentative decisions is thatthe reliability of the decisions increases with the increasing depthinto the path memory. For example, V_(1F) is a more reliable version ofV_(0F) delayed by one symbol period. In the absence of errors, V_(1F)would be always equal to a delayed version of V_(0F). In the presence oferrors, V_(1F) is different from V_(0F), and the probability of V_(1F)being in error is lower than the probability of V_(0F) being in error.Similarly, V_(2F) is a more reliable delayed version of V_(1F).

[0177] Referring to FIG. 15, the DFE 612 is a filter having 33coefficients c₀ through c₃₂ corresponding to 33 taps and a delay line1504. The delay line is constructed of sequentially disposed summingjunctions and delay elements, such as registers, as is well understoodin the art of filter design. In the illustrated embodiment, thecoefficients of the DFE 612 are updated once every four symbol periods,i.e., 32 nanoseconds, in well known fashion, using the well known LeastMean Squares algorithm, based on a decision input 1505 from the Viterbiblock and an error input 42 dfe.

[0178] The symbols V_(0F), V_(1F), and V_(2F) are “jammed”, meaninginputted at various locations, into the delay line 1504 of the DFE 612.Based on these symbols, the DFE 612 produces an intersymbol interference(ISI) replica portion associated with all previous symbols except thetwo most recent (since it was derived without using the first two tapsof the DFE 612). The ISI replica portion is subtracted from the output37 of the deskew memory block 36 to produce the signal 1508 which isthen fed to the MDFE block. The signal 1508 is denoted as the “tail”component in FIG. 3. In the illustrated embodiment, the DFE 612 has 33taps, numbered from 0 through 32, and the tail component 1508 isassociated with taps 2 through 32. As shown in FIG. 15, due to a circuitlayout reason, the tail component 1508 is obtained in two steps. First,the ISI replica associated with taps 3 through 32 is subtracted from thedeskew memory output 37 to produce an intermediate signal 1507. Then,the ISI replica associated with the tap 2 is subtracted from theintermediate signal 1507 to produce the tail component 1508.

[0179] The DFE 612 also computes the ISI replica 1510 associated withthe two most recent symbols, based on tentative decisions V_(0F),V_(1F), and V_(2F). This ISI replica 1510 is subtracted from a delayedversion of the output 37 of the deskew memory block 36 to provide a softdecision 43. The tentative decision V_(0F) is subtracted from the softdecision 43 in order to provide an error signal 42. Error signal 42 isfurther processed into several additional representations, identified as42 enc, 42 ph and 42 dfe. The error 42 enc is provided to the echocancelers and NEXT cancelers of the constituent transceivers. The error42 ph is provided to the FFEs 26 (FIG. 2) of the four constituenttransceivers and the timing recovery block 222. The error 42 dfe isdirected to the DFE 612, where it is used for the adaptive updating ofthe coefficients of the DFE together with the last tentative decisionV_(2F) from the Viterbi block 1502. The tentative decision 44 shown inFIG. 3 is a delayed version of V_(0F). The soft decision 43 is outputtedto a test interface for display purposes.

[0180] The DFE 612 provides the tail component 1508 and the values ofthe two “initial” coefficients C₀ and C₁ to the MDFE 602. The MDFE 602computes eight different replicas of the ISI associated with the firsttwo coefficients of the DFE 612. Each of these ISI replicas correspondsto a different path in the path memory module 608. This computation ispart of the so-called “critical path” of the trellis decoder 38, inother words, the sequence of computations that must be completed in asingle symbol period. At the speed of operation of the Gigabit Ethernettransceivers, the symbol period is 8 nanoseconds. All the challengingcomputations for 4D slicing, branch metrics, path extensions, selectionof best path, and update of path memory must be completed within onesymbol period. In addition, before these computations can even begin,the MDFE 602 must have completed the computation of the eight 4D Viterbiinputs 614 (FIG. 3) which involves computing the ISI replicas andsubtracting them from the output 37 of the de-skew memory block 36 (FIG.2). This bottleneck in the computations is very difficult to resolve.The system of the present invention allows the computations to becarried out smoothly in the allocated time.

[0181] Referring to FIG. 15, the MDFE 602 provides ISI compensation toreceived signal samples, provided by the deskew memory (37 of FIG. 2)before providing them, in turn, to the input of the Viterbi block 1502.ISI compensation is performed by subtracting a multiplicity of derivedISI replica components from a received signal sample so as to develop amultiplicity of signals that, together, represents various expressionsof ISI compensation that might be associated with any arbitrary symbol.One of the ISI compensated arbitrary symbolic representations is thenchosen, based on two tentative decisions made by the Viterbi block, asthe input signal sample to the Viterbi.

[0182] Since the symbols under consideration belong to a PAM-5 alphabet,they can be expressed in one of only 5 possible values (−2, −1, 0, +1,+2). Representations of these five values are stored in a convolutionengine 1511, where they are convolved with the values of the first twofilter coefficients C₀ and C₁ of the DFE 612. Because there are twocoefficient values and five level representations, the convolutionengine 1511 necessarily gives a twenty five value result that might beexpressed as (a_(i)C₀+b_(j)C₁), with C₀ and C₁ representing thecoefficients, and with a_(i) and b_(j) representing the levelexpressions (with i=1, 2, 3, 4, 5 and j=1, 2, 3, 4, 5 rangingindependently).

[0183] These twenty five values are negatively combined with the tailcomponent 1508 received from the DFE 612. The tail component 1508 is asignal sample from which a partial ISI component associated with taps 2through 32 of the DFE 612 has been subtracted. In effect, the MDFE 602is operating on a partially ISI compensated (pre-compensated) signalsample. Each of the twenty five pre-computed values is subtracted fromthe partially compensated signal sample in a respective one of a stackof twenty five summing junctions. The MDFE then saturates the twentyfive results to make them fit in a predetermined range. This saturationprocess is done to reduce the number of bits of each of the 1Dcomponents of the Viterbi input 614 in order to facilitate lookup tablecomputations of branch metrics. The MDFE 602 then stores the resultantISI compensated signal samples in a stack of twenty five registers,which makes the samples available to a 25:1 MUX for input sampleselection. One of the contents of the twenty five registers willcorrespond to a component of a 4D Viterbi input with the ISI correctlycancelled, provided that there was no decision error (meaning the harddecision regarding the best path forced upon taps 2 through 32 of theDFE 612) in the computation of the tail component. In the absence ofnoise, this particular value will coincide with one of the ideal 5-levelsymbol values (i.e., −2, −1, 0, 1, 2). In practice, there will always benoise, so this value will be in general different than any of the idealsymbol values.

[0184] This ISI compensation scheme can be expanded to accommodate anynumber of symbolic levels. If signal processing were performed on PAM-7signals, for example, the convolution engine 1511 would output fortynine values, i.e., a_(i) and b_(j) would range from 1 to 7. Error ratecould be reduced, i.e., performance could be improved, at the expense ofgreater system complexity, by increasing the number of DFE coefficientsinputted to the convolution engine 1511. The reason for this improvementis that the forced hard decision (regarding the best path forced upontaps 2 through 32 of the DFE 612) that goes into the “tail” computationis delayed. If C₂ were added to the process, and the symbols are againexpressed in a PAM-5 alphabet, the convolution engine 1511 would outputone hundred twenty five (125) values. Error rate is reduced bydecreasing the tail component computation, but at the expense of nowrequiring 125 summing junctions and registers, and a 125:1 MUX.

[0185] It is important to note that, as inputs to the DFE 612, thetentative decisions V_(0F), V_(1F), V_(2F) are time sequences, and notjust instantaneous isolated symbols. If there is no error in thetentative decision sequence V_(0F), then the time sequence V_(2F) willbe the same as the time sequence V_(1F) delayed by one time unit, andthe same as the time sequence V_(0F) delayed by two time units. However,due to occasional decision error in the time sequence V_(0F), which mayhave been corrected by the more reliable time sequence V_(1F) or V_(2F),time sequences V_(1F) and V_(2F) may not exactly correspond totime-shifted versions of time sequence V_(0F). For this reason, insteadof using just one sequence V_(0F), all three sequences V_(0F), V_(1F)and V_(2F) are used as inputs to the DFE 612. Although thisimplementation is essentially equivalent to convolving V_(0F) with allthe DFE's coefficients when there is no decision error in V_(0F), it hasthe added advantage of reducing the probability of introducing adecision error into the DFE 612. It is noted that other tentativedecision sequences along the depth of the path memory 608 may be usedinstead of the sequences V_(0F), V_(1F) and V_(2F).

[0186] Tentative decisions, developed by the Viterbi, are taken fromselected locations in the path memory 608 and “jammed” into the DFE 612at various locations along its computational path. In the illustratedembodiment (FIG. 15), the tentative decision sequence V_(0F) isconvolved with the DFE's coefficients C₀ through C₃, the sequence V_(1F)is convolved with the DFE's coefficients C₄ and C₅, and the sequenceV_(2F) is convolved with the DFE's coefficients C₆ through C₃₂. It isnoted that, since the partial ISI component that is subtracted from thedeskew memory output 37 to form the signal 1508 is essentially taken (intwo steps as described above) from tap 2 of the DFE 612, this partialISI component is associated with the DFE's coefficients C₂ through C₃₂.It is also noted that, in another embodiment, instead of using thetwo-step computation, this partial ISI component can be directly takenfrom the DFE 612 at point 1515 and subtracted from signal 37 to formsignal 1508.

[0187] It is noted that the sequences V_(0F), V_(1F), V_(2F) correspondto a hard decision regarding the choice of the best path among the eightpaths (path i is the path ending at state i). Thus, the partial ISIcomponent associated with the DFE's coefficients C₂ through C₃₂ is theresult of forcing a hard decision on the group of higher orderedcoefficients of the DFE 612. The underlying reason for computing onlyone partial ISI signal instead of eight complete ISI signals for theeight states (as done conventionally) is to save in computationalcomplexity and to avoid timing problems. In effect, the combination ofthe DFE and the MDFE of the present invention can be thought of asperforming the functions of a group of eight different conventional DFEshaving the same tap coefficients except for the first two tapcoefficients.

[0188] For each state, there remains to determine which path to use forthe remaining two coefficients in a very short interval of time (about16 nanoseconds). This is done by the use of the convolution engine 1511and the MDFE 602. It is noted that the convolution engine 1511 can beimplemented as an integral part of the MDFE 602. It is also noted that,for each constituent transceiver, i.e., for each 1D component of theViterbi input 614 (the Viterbi input 614 is practically eight 4D Viterbiinputs), there is only one convolution engine 1511 for all the eightstates but there are eight replicas of the select logic 610 and eightreplicas of the MUX 1512.

[0189] The convolution engine 1511 computes all the possible values forthe ISI associated with the coefficients C₀ and C₁. There are onlytwenty five possible values, since this ISI is a convolution of thesetwo coefficients with a decision sequence of length 2, and each decisionin the sequence can only have five values (−2, −1, 0, +1, +2). Only oneof these twenty five values is a correct value for this ISI. Thesetwenty five hypotheses of ISI are then provided to the MDFE 602.

[0190] In the MDFE 602, the twenty five possible values of ISI aresubtracted from the partial ISI compensated signal 1508 using a set ofadders connected in parallel. The resulting signals are then saturatedto fit in a predetermined range, using a set of saturators. Thesaturated results are then stored in a set of twenty five registers.Provided that there was no decision error regarding the best path (amongthe eight paths) forced upon taps 2 through 32 of the DFE 612, one ofthe twenty five registers would contain one 1D component of the Viterbiinput 614 with the ISI correctly cancelled for one of the eight states.

[0191] For each of the eight states, the generation of the Viterbi inputis limited to selecting the correct value out of these 25 possiblevalues. This is done, for each of the eight states, using a 25-to-1multiplexer 1512 whose select input is the output of the select logic610. The select logic 610 receives V₀ ^((i)) and V₁ ^((i)) (i=0, . . . ,7) for a particular state i from the path memory module 608 of theViterbi block 1502. The select logic 610 uses a pre-computed lookuptable to determine the value of the select signal 622A based on thevalues of V₀ ^((i)) and V₁ ^((i)) for the particular state i. The selectsignal 622A is one component of the 8-component select signal 622 shownin FIG. 3. Based on the select signal 622A, the 25-to-1 multiplexer 1512selects one of the contents of the twenty five registers as a 1Dcomponent of the Viterbi input 614 for the corresponding state i.

[0192]FIG. 15 only shows the select logic and the 25-to-1 multiplexerfor one state and for one constituent transceiver. There are identicalselect logics and 25-to-1 multiplexers for the eight states and for eachconstituent transceiver. In other words, the computation of the 25values is done only once for all the eight states, but the 25:1 MUX andthe select logic are replicated eight times, one for each state. Theinput 614 to the Viterbi decoder 604 is, as a practical matter, eight 4DViterbi inputs.

[0193] In the case of the DFE, however, only a single DFE iscontemplated for practice of the invention. In contrast to alternativesystems where eight DFEs are required, one for each of the eight statesimposed by the trellis encoding scheme, a single DFE is sufficient sincethe decision as to which path among the eight is the probable best wasmade in the Viterbi block and forced to the DFE as a tentative decision.State status is maintained at the Viterbi decoder input by controllingthe MDFE output with the state specific signals developed by the 8select logics (610 of FIG. 3) in response to the eight state specificsignals V₀ ¹ and V₁ ^(i), i=0, . . . , 7, from the path memory module(608 of FIG. 3). Although identified as a singular DFE, it will beunderstood that the 4D architectural requirements of the system meansthat the DFE is also 4D. Each of the four dimensions (twisted pairs)will exhibit their own independent contributions to ISI and these shouldbe dealt with accordingly. Thus, the DFE is singular, with respect tostate architecture, when its 4D nature is taken into account.

[0194] In the architecture of the system of the present invention, theViterbi input computation becomes a very small part of the critical pathsince the multiplexers have extremely low delay due largely to theplacement of the 25 registers between the 25:1 multiplexer and thesaturators. If a register is placed at the input to the MDFE 602, thenthe 25 registers would not be needed. However, this would cause theViterbi input computation to be a larger part of the critical path dueto the delays caused by the adders and saturators. Thus, by using 25registers at a location proximate to the MDFE output instead of usingone register located at the input of the MDFE, the critical path of theMDFE and the Viterbi decoder is broken up into 2 approximately balancedcomponents. This architecture makes it possible to meet the verydemanding timing requirements of the Gigabit Ethernet transceiver.

[0195] Another advantageous factor in achieving high-speed ! operationfor the trellis decoder 38 is the use of heavily truncatedrepresentations for the metrics of the Viterbi decoder. Although thismay result in a mathematically non-zero decrease in theoreticalperformance, the resulting vestigial precision is nevertheless quitesufficient to support healthy error margins. Moreover, the use ofheavily truncated representations for the metrics of the Viterbi decodergreatly assists in achieving the requisite high operational speeds in agigabit environment. In addition, the reduced precision facilitates theuse of random logic or simple lookup tables to compute the squarederrors, i.e., the distance metrics, consequently reducing the use ofvaluable silicon real estate for merely ancillary circuitry.

[0196]FIG. 16 shows the word lengths used in one embodiment of theViterbi decoder of this invention. In FIG. 16, the word lengths aredenoted by S or U followed by two numbers separated by a period. Thefirst number indicates the total number of bits in the word length. Thesecond number indicates the number of bits after the decimal point. Theletter S denotes a signed number, while the letter U denotes an unsignednumber. For example, each 1D component of the 4D Viterbi input is asigned 5-bit number having 3 bits after the decimal point.

[0197]FIG. 17 shows an exemplary lookup table that can be used tocompute the squared 1-dimensional errors. The logic function describedby this table can be implemented using read-only-memory devices, randomlogic circuitry or PLA circuitry. Logic design techniques well known toa person of ordinary skill in the art can be used to implement the logicfunction described by the table of FIG. 17 in random logic.

[0198]FIGS. 18A and 18B provide a more complete table describing thecomputation of the decisions and squared errors for both the X and Ysubsets directly from one component of the 4D Viterbi input to the 1Dslicers (FIG. 7). This table completely specifies the operation of theslicers of FIG. 7.

[0199] An exemplary demodulator including a high speed decoder has beendescribed and includes various components that facilitate robust andaccurate acquisition and decoding of PAM-5 constellation signals atspeeds consistent with gigabit operation. Symbol decoding, including ISIcompensation, is accurately performed in a symbol period of about 8 ns,by a transceiver demodulator circuit constructed in a manner so as tofirst, bifurcate the ISI compensation function between an FFE, operatingto compensate partial response pulse shaping filter (remote transmitter)induced ISI, and a decoder operating to compensate ISI perturbationsinduced by transmission channel characteristics, and second, bybifurcating critical path computations into substantially balanced firstand second portions, the first portion including computations performedin a DFE and MDFE element and a second portion including computationsperformed in a Viterbi decoder.

[0200] The DFE element is further advantageous in that it is implementedas only a single conceptual DFE (taking into account its 4D nature)rather than an eight element stack, each of which defines amulti-dimensional input to an eight-state Viterbi. The DFE is “stuffed”,at particular chosen locations, by the first several stages of asequential, multi-stage tentative decision path memory module, so as todevelop a set of “tail” coefficient values in the DFE which, takentogether, represent the algebraic sum of a truncated set of DFEcoefficients C₂ to C₃₂. A received symbol, represented by a five levelconstellation, is convolved with the remaining two DFE coefficients, C₀and C₁, which are taken to represent the transmission channel inducedISI.

[0201] As deskewed signals enter the decoder, the previous symbol,convolved with the DFE coefficients C₃ to C₃₂, is first subtractedtherefrom. Then the previous symbol convolved with C₂ is subtracted andthe resultant (intermediate) symbol is directed to the MDFE. Thisresultant signal might be described as the receive symbol with partialISI introduced by previous symbols subtracted. In the MDFE, all possibleconvolutions of the primary coefficients, C₀ and C₁, with the possiblesymbol values, is subtracted from the intermediate symbol to provide areceive symbol without perturbations induced by ISI.

[0202] It will be evident to one having skill in the art that althoughthe transceiver has been described in the context of a trellis encoded,PAM-5 signal representation, communicated over a multi-pair transmissionchannel, the invention is not limited to any particular communicationtechnique. Specifically, the decoder architecture and signal processingmethodology in accord with the invention is suitable for use with anyform of communication in which the symbolic content of the communicationis represented by multi-level signals. The invention, indeed, becomesparticularly appropriate as the number of signal levels increases.

[0203] Neither is the invention limited to signals encoded in accordancewith a 4D, eight-state, trellis methodology. Trellis encoding forces thesystem to be constructed so as to accommodate the eight states inherentin the trellis methodology. Other coding methodologies and architecturesare expressly contemplated by the invention and can be implemented bymaking the proper modifications to an alternative coding architecture's“state width”, as will be apparent to a skilled integrated circuittransceiver designer. Likewise, the “dimensional depth”, 1D, 2D, 4D . .. for example, may be suitably increased, or decreased to accommodatedifferent forms of transmission channel implementations. As in the caseof increasing signal level representations, the systems and methods ofthe invention are particularly suitable for channels with increased“depth”, such as six, eight, or even higher numbers, of twisted paircabling, single conductor cabling, parallel wireless channels, and thelike.

[0204] In the context of an exemplary integrated circuit-typebidirectional communication system, a further aspect of the inventionmight be characterized as a system and method for adaptively anddynamically regulating the power consumption of an integrated circuitcommunication system as a function of particular, user defined signalquality metrics. Signal quality metrics might include a signal's biterror rate (BER), a signal-to-noise ratio (SNR) specification, noisemargin figure, dynamic range, or the like. Indeed, signal quality is ageneralized term used to describe a signal's functional fidelity.

[0205] As will be understood by one having skill in the art, signalquality is a measurable operational characteristic of various componentportions of modern communication systems. Various forms of signalquality metrics are used to define the features and functionality ofsignal processing portions of integrated circuit communication devices,particularly coder/decoder circuitry, equalizers and filters, each ofwhich require large amounts of silicon real estate for effectiveimplementation, and a consequently large degree of power consumptionduring operation.

[0206] Turning now to FIG. 28, the invention might be described brieflyas a methodology for balancing the conflicting circuit performancerequirements represented by signal quality and power consumption andmight be illustrated as the implementation of a decision matrix havingpower consumption as one of the dimensions and a chosen signal qualitymetric as another. From FIG. 28, it will be understood that integratedcircuit power consumption is directly related to processed signalquality. This is particularly true in the case of integrated circuitsincorporating high order digital filter elements, having a large numberof taps, all of which consume power when in operation.

[0207] However, it has been generally accepted integrated circuit designpractice to construct an integrated circuit communication device toaccommodate the most stringent digital processing that might be requiredby a device in an actual application. In the case of an Ethernettransceiver, for example, provision must be made for processing signalstransmitted over a wide variety of transmission channels exhibitingwidely disparate transmission channel characteristics, ranging fromextremely lossy, highly populated, long wiring run channels, to veryshort (<2 meters) point-to-point installations. In either case, all ofthe signal processing elements of conventional transceiver circuitry areoperative to process a signal, whether needed or not, such that powerconsumption is relatively constant and large.

[0208] In FIG. 28, the evaluation matrix judges an output signal qualitymetric against a threshold standard, and where a measured quality metricis greater than the threshold, allows the power consumption of thedevice to be reduced by turning off various functional processing blocksuntil the output signal quality is reduced to the threshold value. Thisaproach has particular utility in the case of digital filter elements,coder/decoder circuitry and equalizers, all of which include multipleelements that are required for processing signals propagated throughharsh channel environments, but to various degrees unnecessary whensignals are propagated through a more benign channel.

[0209] The evaluation matrix, as exemplified in FIG. 28, might beinitialized by a user input requirement, such as the degree to whichpower consumption is an issue. A particular power consumption valuemight be set as an operational parameter (indicated as “P” in FIG. 28),and portions of the device adaptively turned off until the desired powervalue is reached. This will necessarily affect the signal quality of asignal processed by such truncated circuitry, but, in accordance withthe invention, signal quality is able to be locally maximized to apre-determined power consumption metric, such that device performance isnot unduly sacrificed.

[0210] Various portions of the device might be powered-down inpredetermined sequential combinations with each combination resulting ina particular performance metric. Signal performance is evaluated at eachsequential step. Thus, any one power consumption specification, i.e.“P”, will give a range of performance values (represented as “A” in FIG.28). The best signal performance result is necessarily the chosen metricfor deciding which of the multiplicity of power-down configurations isimplemented. Consequently, where power is the primary concern, signalquality defaults to the best signal performance achievable at thespecified power level.

[0211] Where signal quality (performance) is the primary concern, thesystem is allowed to function normally, with all processing blocksoperative. In this circumstance, power consumption will be expected tobe nominal.

[0212] Where signal quality is desirable, but some accommodation must bemade to power consumption, a user may set a signal quality metric as athreshold standard (indicated as “Q” in FIG. 28), and allow the systemto adaptively and dynamically run through a multiplicity of power-downconfigurations, resulting in a range of power consumption values(indicated as “B” in FIG. 28) in order to determine which of theconfigurations gives the lowest power consumption while retaining thedesired signal quality metric. This methodology is particularlyeffective in high order filters with multiple taps, and in decoderblocks that might implement a trellis decoder in a fully functionalform, but which might be adequate when truncated to a simple slicer incertain situations.

[0213] No matter how implemented, however, all that is required forpractice of the invention is that power consumption be established asone basis of an evaluation matrix, and that some signal quality ordevice performance characteristic, having a relationship to device powerconsumption, be established as another. As one of the bases are defined,as by a user input, for example, the other basis is locally maximized(in the case of performance) or minimized (in the case of power) by anadaptive and dynamic procedure that chooses the most pertinent portionsof an integrated circuit to disable. The procedure is adaptive in thesense that it is not fixed in time. As channel and signalcharacteristics can be expected to vary with time, a changing signalquality metric will force a re-evaluation of the matrix. A furtherreduction of power consumption, or a further enhancement of signalquality may be obtained.

[0214] In order to appreciate the advantages of the present invention,it will be beneficial to describe the invention in the context of anexemplary bidirectional communication device, such as an Ethernettransceiver. The particular exemplary implementation chosen is depictedin FIG. 1, which is a simplified block diagram of a multi-paircommunication system operating in conformance with the IEEE 802.3abstandard (also termed 1000BASE-T) for 1 gigabit (Gb/s) Ethernetfull-duplex communication over four twisted pairs of Category-5 copperwires.

[0215] The communication system illustrated in FIG. 1 is represented asa point-to-point system, in order to simplify the explanation, andincludes two main transceiver blocks 102 and 104, coupled together viafour twisted-pair cables 112 a, b, c and d. Each of the wire pairs 112a, b, c, d is coupled to each of the transceiver blocks 102, 104 througha respective one of four line interface circuits 106. Each of the wirepairs 112 a, b, c, d facilitates communication of information betweencorresponding pairs of four pairs of transmitter/receiver circuits(constituent transceivers) 108. Each of-the constituent transceivers 108is coupled between a respective line interface circuit 106 and aPhysical Coding Sublayer (PCS) block 110. At each of the transceiverblocks 102 and 104, the four constituent transceivers 108 are capable ofoperating simultaneously at 250 megabits of information data per second(Mb/s) each, and are coupled to the corresponding remote constituenttransceivers through respective line interface circuits to facilitatefull-duplex bidirectional operation. Thus, 1 Gb/s communicationthroughput of each of the transceiver blocks 102 and 104 is achieved byusing four 250 Mb/s (125 Mbaud at 2 information data bits per symbol)constituent transceivers 108 for each of the transceiver blocks 102, 104and four pairs of twisted copper cables to connect the two transceiverblocks 102, 104 together.

[0216] The exemplary communication system of FIG. 1 has a superficialresemblance to a 100BASE-T4 system, but is configured to operate at tentimes the bit rate. As such, it should be understood that certain systemperformance characteristics, such as sampling rates and the like, willbe consequently higher and cause a greater degree of power consumption.Also, at gigabit data rates over potentially noisy channels, aproportionately greater degree of signal processing is required in manyinstances to insure an adequate degree of signal fidelity and quality.

[0217]FIG. 2 is a simplified block diagram of the functionalarchitecture and internal construction of an exemplary transceiverblock, indicated generally at 200, such as transceiver 102 of FIG. 1.Since the illustrative transceiver application relates to gigabitEthernet transmission, the transceiver will be refered to as the“gigabit transceiver”. For ease of illustration and description, FIG. 2shows only one of the four 250 Mb/s constituent transceivers which areoperating simultaneously (termed herein 4-D operation). However, sincethe operation of the four constituent transceivers are necessarilyinterrelated, certain blocks and signal lines in the exemplaryembodiment of FIG. 2 perform four-dimensional operations and carryfour-dimensional (4-D) signals, respectively. By 4-D, it is meant thatthe data from the four constituent transceivers are used simultaneously.In order to clarify signal relationships in FIG. 2, thin linescorrespond to 1-dimensional functions or signals (i.e., relating to onlya single constituent transceiver), and thick lines correspond to 4-Dfunctions or signals (relating to all four constituent transceivers).

[0218] Referring to FIG. 2, the gigabit transceiver 200 includes aGigabit Medium Independent Interface (GMII) block 202 subdivided into areceive GMII circuit 202R and a transmit GMII circuit 202T. Thetransceiver also includes a Physical Coding Sublayer (PCS) block 204,subdivided into a receive PCS circuit 204R and a transmit PCS circuit204T, a pulse shaping filter 206, a digital-to analog (D/A) converterblock 208, and a line interface block 210, all generally encompassingthe transmitter portion of the transceiver.

[0219] The receiver portion generally includes a highpass filter 212, aprogrammable gain amplifier (PGA) 214, an analog-to-digital (A/D)converter 216, an automatic gain control (AGC) block 220, a timingrecovery block 222, a pair-swap multiplexer block 224, a demodulator226, an offset canceller 228, a near-end crosstalk (NEXT) cancellerblock 230 having three constituent NEXT cancellers and an echo canceller232.

[0220] The gigabit transceiver 200 also includes an A/Dfirst-in-first-out buffer (FIFO) 218 to facilitate proper transfer ofdata from the analog clock region to the receive clock region, and aloopback FIFO block (LPBK) 234 to facilitate proper transfer of datafrom the transmit clock region to the receive clock region. The gigabittransceiver 200 can optionally include an additional adaptive filter tocancel far-end crosstalk noise (FEXT canceller).

[0221] In operational terms, on the transmit path, the transmit section202T of the GMII block receives data from the Media Access Control (MAC)module in byte-wide format at the rate of 125 MHz and passes them to thetransmit section 204T of the PCS block via the FIFO 201. The FIFO 201ensures proper data transfer from the MAC layer to the Physical Coding(PHY) layer, since the transmit clock of the PHY layer is notnecessarily synchronized with the clock of the MAC layer. In oneembodiment, this small FIFO 201 has from about three to about fivememory cells to accommodate the file elasticity requirement which is afunction of frame size and frequency offset.

[0222] The PCS transmit section 204T performs certain scamblingoperations and, in particular, is responsible for encoding digital datainto the requisite codeword representations appropriate fortransmission. In, the illustrated embodiment of FIG. 2, the transmit PCSsection 204T incorporates a coding engine and signal mapper thatimplements a trellis coding architecture, such as required by the IEEE802.3ab specification for gigabit transmission.

[0223] In accordance with this encoding architecture, the PCS transmitsection 204T generates four 1-D symbols, one for each of the fourconstituent transceivers. The 1-D symbol generated for the constituenttransceiver depicted in FIG. 2 is filtered by the pulse shaping filter206. This filtering assists in reducing the radiated emission of theoutput of the transceiver such that it falls within the parametersrequired by the Federal Communications Commission. The pulse shapingfilter 206 is implemented so as to define a transfer function of0.75+0.25z⁻¹. This particular implementation is chosen so that the powerspectrum of the output of the transceiver falls below the power spectrumof a 100Base-Tx signal. The 100Base-Tx is a widely used and acceptedFast Ethernet standard for 100 Mb/s operation on two pairs of Category-5twisted pair cables. The output of the pulse shaping filter 206 isconverted to an analog signal by the D/A converter 208 operating at 125MHz. The analog signal passes through the line interface block 210, andis placed on the corresponding twisted pair cable.

[0224] On the receive path, the line interface block 210 receives ananalog signal from the twisted pair cable. The received analog signal ispreconditioned by the highpass filter 212 and the PGA 214 before beingconverted to a digital signal by the A/D converter 216 operating at asampling rate of 125 MHz. The timing of the A/D converter 216 iscontrolled by the output of the timing recovery block 222. The resultingdigital signal is properly transferred from the analog clock region tothe receive clock region by the A/D FIFO 218. The output of the A/D FIFO218 is also used by the AGC 220 to control the operation of the PGA 214.

[0225] The output of the A/D FIFO 218, along with the outputs from theA/D FIFOs of the other three constituent transceivers are inputted tothe pair-swap multiplexer block 224. The pair-swap multiplexer block 224uses the 4-D pair-swap control signal from the receive section 204R ofPCS block to sort out the four input signals and send the correctsignals to the respective feedforward equalizers 26 of the demodulator226. This pair-swapping control is needed for the following reason. Thetrellis coding methodology used for the gigabit transceivers (102 and104 of FIG. 1) is based on the fact that a signal on each twisted pairof wire corresponds to a respective 1-D constellation, and that thesignals transmitted over four twisted pairs collectively form a 4-Dconstellation. Thus, for the decoding to work, each of the four twistedpairs must be uniquely identified with one of the four dimensions. Anyundetected swapping of the four pairs would result in erroneousdecoding. In an alternate embodiment of the gigabit transceiver, thepair-swapping control is performed by the demodulator 226, instead ofthe combination of the PCS receive section 204R and the pair-swapmultiplexer block 224.

[0226] The demodulator 226 includes a feed-forward equalizer (FFE) 26for each constituent transceiver, coupled to a deskew memory circuit 36and a decoder circuit 38, implemented in the illustrated embodiment as atrellis decoder. The deskew memory circuit 36 and the trellis decoder 38are common to all four constituent transceivers. The FFE 26 receives thereceived signal intended for it from the pair-swap multiplexer block224. The FFE 26 is suitably implemented to include a precursor filter28, a programmable inverse partial response (IPR) filter 30, a summingdevice 32, and an adaptive gain stage 34. The FFE 26 is aleast-mean-squares (LMS) type adaptive filter which is configured toperform channel equalization as will be described in greater detailbelow.

[0227] The precursor filter 28 generates a precursor to the input signal2. This precursor is used for timing recovery. The transfer function ofthe precursor filter 28 might be represented as −γ+z⁻¹, with γ equal to{fraction (1/16)} for short cables (less than 80 meters) and ⅛ for longcables (more than 80 m). The determination of the length of a cable isbased on the gain of the coarse PGA 14 of the programmable gain block214.

[0228] The programmable IPR filter 30 compensates the ISI (intersymbolinterference) introduced by the partial response pulse shaping in thetransmitter section of a remote transceiver which transmitted the analogequivalent of the digital signal 2. The transfer function of the IPRfilter 30 may be expressed as 1/(1+Kz⁻¹). In the present example, K hasan exemplary value of 0.484375 during startup, and is slowly ramped downto zero after convergence of the decision feedback equalizer includedinside the trellis decoder 38. The value of K may also be any positivevalue strictly less than 1.

[0229] The summing device 32 receives the output of the IPR filter 30and subtracts therefrom adaptively derived cancellation signals receivedfrom the adaptive filter block, namely signals developed by the offsetcanceller 228, the NEXT cancellers 230, and the echo canceller 232. Theoffset canceller 228 is an adaptive filter which generates an estimateof signal offset introduced by component circuitry of the transceiver'sanalog front end, particularly offsets introduced by the PGA 214 and theA/D converter 216.

[0230] The three NEXT cancellers 230 may also be described as adaptivefilters and are used, in the illustrated embodiment, for modeling theNEXT impairments in the received signal caused by interference generatedby symbols sent by the three local transmitters of the other threeconstituent transceivers. These impairments are recognized as beingcaused by a crosstalk mechanism between neighboring pairs of cables,thus the term near-end crosstalk, or NEXT. Since each receiver hasaccess to the data transmitted by the other three local transmitters, itis possible to approximately replicate the NEXT impairments throughfiltering. Referring to FIG. 2, the three NEXT cancellers 230 filter thesignals sent by the PCS block to the other three local transmitters andproduce three signals replicating the respective NEXT impairments. Bysubtracting these three signals from the output of the IPR filter 30,the NEXT impairments are approximately cancelled.

[0231] Due to the bi-directional nature of the channel, each localtransmitter causes an echo impairment on the received signal of thelocal receiver with which it is paired to form a constituenttransceiver. In order to remove this impairment, an echo canceller 232is provided, which may also be characterized as an adaptive filter, andis used, in the illustrated embodiment, for modeling the signalimpairment due to echo. The echo canceller 232 filters the signal sentby the PCS block to the local transmitter associated with the receiver,and produces an approximate replica of the echo impairment. Bysubtracting this replica signal from the output of the IPR filter 30,the echo impairment is approximately cancelled.

[0232] The adaptive gain stage 34 receives the processed signal from thesumming circuit 32 and fine tunes the signal path gain using azero-forcing LMS algorithm. Since this adaptive gain stage 34 trains onthe basis of error signals generated by the adaptive filters 228, 230and 232, it provides a more accurate signal gain than the one providedby the PGA 214 in the analog section.

[0233] The output of the adaptive gain stage 34, which is also theoutput of the FFE 26, is inputted to the deskew memory circuit 36. Thedeskew memory 36 is a four-dimensional function block, i.e., it alsoreceives the outputs of the three FFEs of the other three constituenttransceivers. There may be a relative skew in the outputs of the fourFFEs, which are the four signal samples representing the four symbols tobe decoded. This relative skew can be up to 50 nanoseconds, and is dueto the variations in the way the copper wire pairs are twisted. In orderto correctly decode the four symbols, the four signal samples must beproperly aligned. The deskew memory aligns the four signal samplesreceived from the four FFEs, then passes the deskewed four signalsamples to a decoder circuit 38 for decoding.

[0234] In the context of the exemplary embodiment, the data received atthe local transceiver was encoded before transmission, at the remotetransceiver. In the present case, data might be encoded using an 8-statefour-dimensional trellis code, and the decoder 38 might therefore beimplemented as a trellis decoder. In the absence of intersymbolinterference (ISI), a proper 8-state Viterbi decoder would provideoptimal decoding of this code. However, in the case of Gigabit Ethernet,the Category-5 twisted pair cable introduces a significant amount ofISI. In addition, the partial response filter of the remote transmitteron the other end of the communication channel also contributes some ISI.Therefore, the trellis decoder 38 must decode both the trellis code andthe ISI, at the high rate of 125 MHz. In the illustrated embodiment ofthe gigabit transceiver, the trellis decoder 38 includes an 8-stateViterbi decoder, and uses a decision-feedback sequence estimationapproach to deal with the ISI components.

[0235] The 4-D output of the trellis decoder 38 is provided to the PCSreceive section 204R. The receive section 204R of the PCS blockde-scrambles and decodes the symbol stream, then passes the decodedpackets and idle stream to the receive section 202T of the GMII blockwhich passes them to the MAC module. The 4-D outputs, which are theerror and tentative decision, respectively, are provided to the timingrecovery block 222, whose output controls the sampling time of the A/Dconverter 216. One of the four components of the error and one of thefour components of the tentative decision correspond to the receivershown in FIG. 2, and are provided to the adaptive gain stage 34 of theFFE 26 to adjust the gain of the equalizer signal path. The errorcomponent portion of the decoder output signal is also provided, as acontrol signal, to adaptation circuitry incorporated in each of theadaptive filters 228, 229, 230, 231 and 232. Adaptation circuitry isused for the updating and training process of filter coefficients.

[0236]FIG. 3 is a block diagram of the trellis decoder 38 of FIG. 2 Thetrellis decoder 38 includes a multiple decision feedback equalizer(MDFE) 602, a Viterbi decoder 604, a path metrics module 606, a pathmemory module 608, a select logic 610, and a decision feedback equalizer612. There are eight Viterbi inputs and eight Viterbi decisionscorresponding to the eight states. Each of the eight Viterbi inputs(respectively, decisions) is a 4-dimensional vector whose fourcomponents are the Viterbi inputs (respectively, decisions) for the fourconstituent transceivers, respectively.

[0237] The adaptive filters used to implement the echo canceller 232 andthe NEXT cancellers 229, 230 and 231 are typically finite impulseresponse (FIR) filters. FIG. 29A shows a structure of an adaptive FIRfilter used as an echo/NEXT canceller in one embodiment of the gigabittransceiver.

[0238] Referring to FIG. 29A, the adaptive FIR filter includes an inputsignal path P_(in), an output signal path P_(out), and N taps (N is ninein FIG. 29A). Each tap connects a point on the input signal path P_(in)to a point on the output signal path P_(out). Each tap, except for thelast tap, includes a coefficient C_(i), a multiplier M_(i) and an adderA_(i), i=0, . . . , N−2. The last tap includes the coefficient C_(N−1),the multiplier M_(N−1), and no adder. The coefficients C_(i), where i=0,. . . , N−1, are stored in coefficient registers. During each adaptationprocess, the values of the coefficients C_(i) are trained using awell-known least-mean-squares algorithm by an adaptation circuitry (notshown in FIG. 29A). After training, the coefficients C_(i) converge tostable values. The FIR filter includes a set of delay elements D_(i),conventionally denoted by z⁻¹ in FIG. 29A. The number of delay elementsD_(i) determines the order of the FIR filter. The output y(n), i.e., thefilter output at time instant n, is a function of the input at timeinstant n and of the past inputs at time instants n−1 through n−(N−1),and is expressed as: $\begin{matrix}{{y(n)} = {\sum\limits_{i = 0}^{N - 1}{C_{i}{x( {n - i} )}}}} & (1)\end{matrix}$

[0239] where x(n−i) denotes the input at time instant n−i, and N denotesthe number of taps. The output y(n), as shown in Equation (1), is aweighted sum of the input data x(n−i), with i=0, . . . , N−1. Thecoefficients C_(i) act as the weighting factors on the input data. If acoefficient C_(i) has a very small absolute value, relative to thevalues of other coefficients, then the contribution of the correspondinginput data x(n−i) to the value of y(n) is relatively insignificant.

[0240]FIG. 29B is an equivalent structure of the filter shown in FIG.29A. The two structures in FIGS. 29A and 29B provide the same filtertransfer function, but differ in certain performance characteristics.The difference is due to the placement of the delay elements D_(i), i=1,. . . , N−1 (N=9 in FIGS. 29A, 29B). If all the delay elements areplaced in the input path P_(in), as in the well-known direct form of theFIR filter, then the registers that are used to implement the delayelements are small, need only to be of the same size as the input datax(n). If all the delay elements are placed on the output path P_(out) asin the well-known transposed form of the FIR filter, then the registersused as the delay elements must have more bits in order to hold thelargest possible sum of products C_(i)*x(n−i). Large registers cost moreand consume more power than small registers. Thus, the advantage ofplacing the delay elements on the input path instead of the output pathis that fewer register bits are required. However, the larger the numberof the delay elements on the input path, the lower the operating speedof the filter is.

[0241] If the propagation delay from the input of the filter to the lasttap exceeds the required clock period, then the filter is not usable. Tobreak the long propagation delay, that would occur if all the delayelements were placed on the input path P_(in), into small delayintervals, some of the delay elements are placed on the output pathP_(out), at regular intervals, as shown in the filter structures inFIGS. 29A and 29B. The structure in FIG. 29B, which has a “two-to-one”split of delay elements between the input path and the output path, canoperate at a higher clock speed than the structure in FIG. 29A, whichhas a “three-to-one” split. Computational results show that both ofthese structures are acceptable for use in a high-speed system such asthe gigabit transceiver.

[0242] The taps of the adaptive FIR filters used in the gigabittransceiver can be switched from an active state to an inactive state.FIG. 29C shows a modification to the structure of FIG. 29B to bypass adeactivated tap.

[0243] Referring to FIG. 29C, the filter structure includes a bypasscircuit for each adder A_(i), i=0, . . . , N−1. Each bypass circuitincludes a gate G_(i) indicated as an AND gate, and a multiplexer U_(i).Also associated with each bypass circuit is a control signal S₁, whichindicates the active or inactive state of the tap having the coefficientC_(i) and the adder A_(i). S_(i) is set equal to one if the tap isintended to be active, and set equal to zero if the tap is intended tobe inactive. When S_(i)=1, the output of any arbitrarily chosen gateG_(i) is equal to the data signal at the input of that gate G_(i). Atthe corresponding multiplexer U₁, in the case where S_(i)=1, only theoutput signal from the adder A_(i) is outputted by the multiplexer. Inthe case where S_(i)=0, the output of gate G_(i) is zero, and the datasignal at the input of gate G_(i) flows to the multiplexer U_(i) via thecorresponding bypass connection B_(i), bypassing the adder A_(i). At themultiplexer U_(i), due to S_(i)=0, only the data signal from the bypassconnection B_(i) is outputted.

[0244] The foregoing is only one exemplary implementation of a filterconfiguration wherein taps can be switched between active and inactivestates. An alternative implementation is one where the multipliers M₁coupled to receive filter coefficients from associated coefficientregisters are able to be switched between active and inactive states.

[0245]FIG. 29D is a semi-schematic block diagram of a multiplier 2900,such as might be associated with each tap coefficient. The multiplier2900 is configured to receive a coefficient word, from a correspondingcoefficient register. The coefficient word is received in a multiplexercircuit 2902, which receives the coefficient in two configurations: afirst “raw” configuration taken directly from the coefficient register,and a second “times 2” configuration taken from the register but shiftedone position to the left. The second coefficient configuration, then,represents the “raw” value multiplied by two. Since the secondcoefficient configuration is a shifted one and, necessarily contains onefewer bit than the “raw” coefficient, the “times two” coefficient set ispadded by the bit value 0 (this is done by tying the least significantbit to V_(SS), which is ground). This is a particularly efficientimplementation of a multiplier which takes advantage of the fact thatthe symbols can only have the values {−2, −1, 0, +1, +2}. The symbolsare represented by three bits in sign-magnitude representation, with bit2 indicating the sign (+or −).

[0246] A select OR gate 2904 “ORs” an OFF signal with the value ofsymbol bit 0 to select which coefficient representation is selected topass through the multiplexer 2902. When the value of symbol bit 0 is 1,the “raw” coefficient, representing either −1, 0, +1 is selected. WhenOFF is equal to logical 1, the same condition applies. The coefficientselected by multiplexer 2902 is directed to one input of an XOR gatewhere it is exclusively “ORed” with an output signal from a select ANDgate 2908. The AND gate 2908 “ANDs” an inverted OFF signal with thesymbol bit 2 value. When OFF is logical 0, i.e., inverted OFF is logical1, and when symbol bit 2 is 1, the XOR functions to designate that thesign of the coefficient is negative. It should be understood that theXOR is configured as a stack (of 10 individual XOR gates), and thatmanipulation of the carry bit determines the sign of the coefficients.

[0247] The signed coefficient is directed to an additional AND gate2910, where it is “ANDed” with the output of a second select AND gate2912. The output of second select AND gate 2912 is developed by “ANDing”the inverted OFF signal with the “ORed” result between symbol bits 0, 1and 2. The effective function of OR gate 2914 is to differentiatebetween the symbol zero value and the other symbol values. In effect, ORgate 2914 is a symbol {0} detect circuit.

[0248] Tap disablement is a function of the OFF signal value. When OFFis logical 1, the multiplexer is set to select “one”, i.e., the “raw”coefficient. When OFF equal to logical 1, inverted OFF is logical 0,causing the first and second select AND gates 2908 and 2912 to output azero regardless of the value of the symbol bit input. Since the outputof AND gate 2912 is zero, the AND gate stack 2910 also outputs a zero,which is directed to a corresponding tap adder A_(i) in the output pathof the adaptive filter (FIGS. 29A, 29B or 29C). Adding a zero requiresno computation and the tap is thus effectively deactivated.

[0249] The underlying reason for ORing the OFF signal in the OR gate2904 and for ANDing the inverse OFF signal in the AND gate 2908 is toensure that no transitions take place inside the multiplier when the tapis deactivated. Without the OFF signal as input to the OR gate 2904, theselect input to the multiplexer 2902 will toggle depending on the valueof the symbol bit 0. Without the inverse OFF signal as input to the ANDgate 2908, one of the two inputs to the XOR 2906 will toggle dependingon the value of the symbol bit 2. This toggling, or transition, woulddissipate power. The reason for ANDing the inverse OFF signal in the ANDgate 2912 is to ensure that the multiplier output (which is the outputof AND gate 2910) is zero when the tap is deactivated.

[0250] Referring back to FIG. 2, the adaptive FIR filters used as theecho canceller 232 and the three NEXT cancellers 229, 230 and 231,require large numbers of taps to be effective as cancellers for a widerange of twisted pairs of cables. Echo/NEXT responses differ fordifferent cables, and require different taps in the cancellers to modelthem. Therefore, cancellers are built with enough taps to provideadequate cancellation with the worst-case expected cable responses. Forexample, in the illustrated embodiment of the gigabit transceiver ofFIG. 2, each echo canceller has one hundred ninety two (192) taps, andeach NEXT canceller has thirty six (36) taps (it is noted that there isalso a total of 132 taps in the DFE which are always active). Sincethere are four echo cancellers (one per constituent transceiver) andtwelve NEXT cancellers (three per constituent transceiver) in thegigabit transceiver, the total number of taps that can be activated ordeactivated in the gigabit transceiver is twelve hundred (1200). Whenactive, each of these taps consumes a small amount of power. Due totheir large number, if all of the taps are active at the same time,their individual power consumption values will sum to significantlylarge total power consumption figure. This power consumption, if notregulated, generally causes a high degree of localized heating in anintegrated circuit; often resulting in reliability issues, skewedcircuit performance and, in some cases, catastrophic device failure.

[0251] Regulation of this power consumption is possible since not all ofthe taps are required to be active on any given channel at any giventime. The taps that are not required to be active are the ones that donot significantly contribute to the performance of the system. However,which taps are not required to be active at a given time is not known apriori. Such unnecessary taps can become needed at a different time dueto dynamic changes in the cable response. The present inventiondynamically determines which, if any, taps are unnecessary for adequateperformance in a particular application, and deactivates them. Thepresent invention also re-activates any previously deactivated taps thatsubsequently become necessary, due to changes in the cable response, forsystem performance improvement. As applied to the adaptive filters, themethod of the present invention might be characterized as a tap powerregulation method.

[0252]FIG. 30 is a flowchart of a first exemplary embodiment of a methodfor implementing principles of the present invention. A specified errorand a specified power are provided. They may be specified by a user. Thespecified power represents the maximum power consumption that isallowed. If no power is specified, it is assumed to be infinite. Thespecified error represents the maximum degradation of the systemperformance that is allowed and is preferably expressed as a meansquared error (MSE). Since the signal power is constant, the MSEcorresponds to a ratio of mean squared error to signal (MSE/signal)usually expressed in decibels (dB).

[0253] In FIG. 30, before the start of process 3000, no coefficient isactive. Upon start (block 3002), process 3000 initializes a threshold toa value (block 3004). This initial value of the threshold can resultfrom a simulation test, or can be equal to the minimum absolute value ofa tap coefficient (as known from past experiments). This value is notcritical as long as it is sufficiently low to avoid a large degradationof the system performance. The taps in a first block are activated(block 3006). The size of this first block, i.e., the number of taps inthe first block, depends on the application. In one application, thisnumber is 120. The coefficients of the active taps are trained with theLMS algorithm until convergence (block 3008).

[0254] The absolute values of the active tap coefficients are comparedwith the threshold (block 3010). The taps whose absolute values are lessthan the threshold are deactivated (block 3012). An error metric,typically a mean squared error (MSE) corresponding to a ratio of meansquared error to signal, and a power metric are computed (block 3014).Process 3000 then checks whether a first test is satisfied (block 3016).In the first embodiment of the invention, this first test is satisfiedwhen the error metric is greater than the specified error and the powermetric is smaller than the specified maximum power. If the error metricis greater than the specified error, this implies that the threshold hasbeen set too high, causing too many taps to be deactivated, and this hasdegraded the system performance by more than the specified amount. Ifthe first test is satisfied, then the threshold is decreased (block3018), and all the taps in the block being considered are activatedagain (block 3006) and process 3000 proceeds with a lower threshold.Otherwise, process 3000 determines whether all the taps of the filterhave been considered (block 3020). If not, then the next block of tapsis considered, and this new block of taps is activated (block 3006). Atypical size of this next block of taps is 20. All of the active tapcoefficients, including the new activated tap coefficients, areconverged with an LMS algorithm (block 3008) and process 3000 proceedsas described above.

[0255] If all of the taps have been considered, then process 3000 checkswhether a second test is satisfied (block 3024). In the first embodimentof the invention, the second test is satisfied when the error metric issmaller than the specified error or the power metric is larger than thespecified power. If the error metric is smaller than the specifiederror, this implies that it is possible to increase the threshold todeactivate more taps and still meet the system performance requirement.If the power metric is greater than the specified power, then thethreshold must be increased to lower the power consumption, regardlessof the system performance requirement. If the second test is satisfied,then the threshold is increased (block 3026) and the active taps arecompared with the updated threshold (block 3010). Otherwise, process3000 turns off the power on the taps that are subsequent to the tapwhich has the last highest ordered active coefficient (block 3028). Inother words, if Ck is the last highest ordered active coefficient, thenall the taps that have the deactivated coefficients C_(k+1) throughC_(N−1) are powered down. More details on the power down function inblock 3028 are provided below. Process 3000 then terminates (block3030).

[0256] When process 3000 is restarted (block 3032), a block of taps isactivated (block 3006). Upon restart of process 3000, the threshold isat its last value from the last application of process 3000. Thecoefficients that were previously deactivated are activated with theirvalues remaining at their last values before deactivation. Then process3000 proceeds as described above.

[0257] Periodic restart of process 3000 is desirable for the followingreason. In some cases, the echo/NEXT path impulse response may changeduring normal operation. For example, this change may be a result oftemperature changes. To correct for this change, process 3000periodically restarts to turn on the deactivated coefficients in asequential manner (block 3006), re-converges the coefficients (block3008), and determines whether the previously deactivated coefficientsare still below the threshold (block 3010). If the previouslydeactivated coefficients are now converged to values above thethreshold, they remain active, otherwise they are deactivated (block3012). Any of the initially active coefficients that now fall below thethreshold are also deactivated (block 3012).

[0258] The underlying reason for activating the taps a few at a time(block 3006 through 3020) is the following. When the total number oftaps is very large, the power consumption can be very large during theinitial convergence transient. This peak power consumption is veryundesirable, and is unaffected by the tap power regulation process(which can only reduce the average power consumption of the filters).One solution to this peak power consumption problem is to activate andconverge the taps in an initial small block of taps (blocks 3006, 3008),deactivate some of the converged taps according to a criterion (block3010 through block 3020), activate a next block of taps (block 3006),converge all the active taps including the newly activated taps (block3008), and repeat the process of deactivation, activation andconvergence until all the taps of the filter are processed.

[0259] Power-down block 3028, which is optional, of process 3000 helpsfurther reduce the power consumption of the adaptive filters. Withoutblock 3028, although the tap power regulating process 3000 alreadyachieves a large reduction of the power consumption by reducing thenumber of active taps, there is still a significant amount of powerdissipated by the long delay line of the adaptive filter. By delay line,it is meant the line connecting the delay elements together. Turning atap off does not necessarily affect the configuration of the delay line.However, in many practical cases, many of the deactivated taps arelocated contiguously at the highest-ordered end of the filter. Anexample of such a case is when the cable is short and well behaved. Insuch cases, the portion of the delay line associated with thesecontiguously deactivated taps can be completely powered down withoutaffecting the transfer function of the filter. This powering downcontributes an additional reduction of power dissipation of the filter.In one exemplary application, this additional reduction of powerdissipation is approximately 300 milliwatts (mW) per echo canceller and70 mW per NEXT canceller, resulting in a power saving of 2.04 Watts forthe gigabit transceiver.

[0260] An exemplary implementation of block 3028 is as follows. Anadditional bit, called the delay line enable bit, is associated witheach tap of a filter. This bit is initially ON. When process 3000reaches block 3028, all of the taps are scanned for active statusstarting from the highest-ordered end of the filter, i.e., the tapincluding the coefficient C_(N−1) towards the lowest-ordered end, i.e.,the tap including the coefficient C₀. During scanning, the delay lineenable bits of the scanned inactive taps are switched OFF until thefirst highest-ordered active tap is found. At this point, the scanningfor tap active status terminates. Then all the delay line sectionscorresponding to the taps whose delay line enable bits are OFF arepowered down.

[0261] Activation block 3006 of FIG. 30 is applied sequentially to theecho canceller 232 and the three NEXT cancellers 229, 230 and 231 (ofFIG. 2). FIG. 31 illustrates the flowchart of one exemplary embodimentof the activation block 3006.

[0262] Referring to FIG. 31, upon start (block 3102), the process 3006sets the filter number to zero (block 3104) to operate on the echocanceller. The filter number zero represents the echo canceller, whilefilter numbers 1 through 3 represent the three NEXT cancellers,respectively. Process 3006 then sets the address and the end equal tothe start address and the end address of the block of taps, respectively(block 3106). The modules TapOn and Tap PowerUp are invoked with theaddress as argument (block 3108). The module TapOn turns on thecircuitry of the tap having the specified address. This circuitryincludes a 1-bit storage to indicate the active status of the tap. Whenthe tap is turned on, the tap is included in the computation of theoutput y(n) of the filter (referring to Equation (1)), and in theadaptation process, i.e., the training and convergence of the filtercoefficients. The module TapPowerUp turns the power on for the delayline section associated with the tap having the specified address.Process 3006 then determines whether the address is equal to the end. Ifit is not, then the address is increased by one (block 3112), toconsider the next tap of the filter. If the address has reached the endaddress of the block of taps, then process 3006 determines whetherfilter number is equal to 3, i.e., whether all the filters in thetransceiver have been considered (block 3114). If not, then filternumber is increased by one, so that the next filter is considered. Ifprocess 3006 has operated on all the filters, then process 3006 sets thestart address equal to the old end address, and sets the new end addressequal to the sum of the old end address and the block size, the blocksize being the size of the next block of taps to be activated (block3118). Process 3006 then terminates (block 3120).

[0263] Deactivation block 3012 of FIG. 30 is applied sequentially to theecho canceller 232 and the three NEXT cancellers 230 (of FIG. 2). FIG.32 illustrates the flowchart of one embodiment of the deactivation block3012.

[0264] Referring to FIG. 32, upon start (block 3202), the process 3012sets the filter number to zero (block 3204) to operate on the echocanceller. The filter number zero represents the echo canceller, whilefilter numbers 1 through 3 represent the three NEXT cancellers,respectively. Process 3012 then sets the address equal to zero and theend equal to the length of the filter minus 1 (block 3206). If theabsolute value of the tap coefficient at the specified address is lessthan T, the threshold, then the module TapOn is invoked to turn off thecircuitry associated with the tap having the specified address (block3208). When the tap is turned off, the tap is removed from thecomputation of the output y(n) of the filter (referring to Equation(1)), and from the adaptation process, i.e., the training andconvergence of the filter coefficients. Process 3012 then determineswhether the address is equal to the end. If it is not, then the tapaddress is increased by one (block 3212), to consider the next tap ofthe filter. If the address has reached the end of the filter taps, thenprocess 3012 determines whether filter number is equal to 3, i.e.,whether all the filters in the transceiver have been considered (block3214). If not, then filter number is increased by one, so that the nextfilter is considered (block 3216). If process 3012 has operated on allthe filters, then process 3012 terminates (block 3218).

[0265] Error-computing block 3014 of FIG. 30 is applied sequentially tothe echo canceller 232 and the three NEXT cancellers 230 (of FIG. 2).FIG. 33 illustrates the flowchart of one embodiment of theerror-computing block 3014.

[0266] Referring to FIG. 33, upon start (block 3302), the process 3014sets the filter number to zero (block 3304) to operate on the echocanceller, and initializes the error metric MSE, the power metric andthe flag. The filter number zero represents the echo canceller, whilefilter numbers 1 through 3 represent the three NEXT cancellers,respectively. Process 3014 then sets the address equal to the length ofthe filter minus 1 (block 3306) to scan the filter taps from the highestordered end. The reason for using this scanning order and the flag is toensure that the taps that will be powered down in block 3028 of FIG. 30will be excluded from the computation of the power metric. A deactivatedtap still consumes a small amount of power if it is not actually powereddown because of the associated delay line section. To compute the newpower metric such that it can be used to accurately regulate the powerconsumption of the system, the process 3014 must exclude from thecomputation the power consumption of a deactivated tap that will bepowered down.

[0267] If TapOn[addr] is zero, i.e., if the tap at the specified addressis turned off, then process 3014 computes the new error metric MSE byadding to the previous value of MSE the squared value of the tapcoefficient at the specified address. Otherwise, if the tap at thespecified address is on, then the flag is set to 1. If the flag is 1,then process 3014 computes the new power metric by adding to theprevious value of the power metric the estimated power consumptionTapPower of the tap having the specified address (block 3308). TapPoweris chosen from precomputed values stored in a look-up table. Theseprecomputed values are functions of the size of the coefficients and ofthe active or inactive status of the coefficient.

[0268] Process 3014 determines whether the address is 0 (block 3310). Ifit is not, then the tap address is decreased by one (block 3312), toconsider the next tap of the filter. If the address has reached 0, thenprocess 3014 determines whether filter number is equal to 3, i.e.,whether all the filters in the transceiver have been considered (block3314). If not, then filter number is increased by one, so that the nextfilter is considered and the flag is reset to 0 (block 3316). If process3014 has operated on all the filters, then process 3014 terminates(block 3318).

[0269] As shown in FIG. 33, the error metric MSE is computed by summingthe squared values of the deactivated tap coefficients. It is noted thatthe error metric can be computed differently, such as deriving it fromthe error component 42A of the 4-D error signal 42 outputted from thetrellis decoder 38 (FIG. 2).

[0270] The MSE as measured from the error output 42 of the trellisdecoder 38 (FIG. 2) will be, hereinafter, referred to as the true MSE.The MSE as measured by summing the squared values of the coefficients ofthe deactivated taps will be, hereinafter, referred as the proxy MSE.

[0271] There is an advantage in using the proxy MSE, instead of the trueMSE, as the error metric. Since the proxy MSE is based solely on thecoefficient values of the deactivated taps, it represents only onecomponent of the noise signal of the gigabit transceiver (othercomponents may be due to quantization noise, external noise, etc.).Therefore, the proxy MSE is unaffected when large external noise, otherthan echo or NEXT noise, severely affects the noise signal, hence thenoise to signal ratio, of the gigabit transceiver. For this reason, theproxy MSE is preferred as the error metric.

[0272] If the true MSE is used as the error metric, then the specifiederror is preferably set at a value corresponding to a noise to signalratio of about −22 dB, because, although theoretically, a true MSEcorresponding to a noise to signal ratio of −19 dB is acceptable for thegigabit transceiver, in practice, it is difficult to obtain adequatesystem performance at that level. If the proxy MSE is used as the errormetric, then the specified error is preferably set at a valuecorresponding to a noise to signal ratio of about −24 dB.

[0273] Power-down block 3028 of FIG. 30 is applied sequentially to theecho canceller 232 and the three NEXT cancellers 230 (of FIG. 2). FIG.34 illustrates the flowchart of one embodiment of the power-down block3028.

[0274] Referring to FIG. 34, upon start (block 3402), the process 3028sets the filter number to zero (block 3404) to operate on the echocanceller first. The filter number zero represents the echo canceller,while filter numbers 1 through 3 represent the three NEXT cancellers,respectively. Process 3028 then sets the address equal to the length ofthe filter minus 1 and the end equal to zero (block 3406). This meansthat the process 3028 starts from the highest ordered end of the filtertowards the lowest ordered end.

[0275] Process 3028 determines whether TapOn[addr] is 1, i.e., whetherthe tap at the specified address is active (block 3408). If the tap isnot active, then process 3028 turns off the power to the tap (block3410), then checks whether the address is equal to the end (block 3412).If the address is not equal to the end, the address is decreased by 1 toconsider the next lower ordered tap (block 3414). If the address hasreached the end, then process 3028 determines whether the filter numberis 3, i.e., whether all the 4 filters have been considered (block 3416).If the filter is not the last one, then filter number is increased by 1so that the next filter is considered (block 3418). Otherwise, process3028 terminates (block 3420).

[0276] If TapOn[addr] is 1 (block 3408), i.e., if the tap at thespecified address is active, then process 3028 stops scanning the tapsin the filter being considered, and checks the next filter, if any(block 3416). Process 3028 then proceeds as described above.

[0277] The process 3000 of FIG. 30 is applied to the echo and NEXTcancellers of each of the 4 constituent transceivers of the gigabittransceiver 102 depicted in FIGS. 2 and 3. It is important to note that,if process 3000 is applied simultaneously to the 4 constituenttransceivers, there will be a power-demand surge in the gigabittransceiver 102. In order to avoid such a power demand surge, process3000 is applied to the 4 transceivers in a time-staggered manner.

[0278] In a second embodiment of the present invention, two differentspecified errors are used in order to avoid possible limit cycleoscillations between activation and deactivation. The flowchart of thesecond embodiment is substantially similar to the one shown in FIG. 30.The second embodiment differs from the first embodiment by using a firstspecified error for the first test in block 3016 (FIG. 30) and a secondspecified error for the second test in block 3024. The first specifiederror is substantially larger than the second specified error. The useof the two different specified errors, sufficiently distant from eachother, allow the process 3000 to terminate when the computed errormetric has a value located between the two specified errors. When justone specified error is used, as in the first embodiment, the computederror metric may jump back and forth around the specified error, causingthe process 3000 to oscillate between activation and deactivation.

[0279] In a third embodiment of the present invention, the first fewtaps of each filter, e.g., the first 10 taps, are exempt fromdeactivation in order to avoid possible degradations of the systemperformance in the presence of jitter. The effect of jitter on these fewtaps is as follows. There is usually a large slew rate in these firstfew taps. Due to this slew rate, their numerical values could changesignificantly if the sampling phase of the received signal changes. Inthe presence of jitter, the sampling phase of the received signal canchange dynamically. Thus, if some of the first few taps wereinsignificant for the system performance, they could become significantas the sampling phase changes. For this third embodiment, the flowchartof the deactivation process of block 3012 is slightly different from theone shown in FIG. 30. The only modification to the flowchart of FIG. 30is to equate, in block 3006, the address to K instead of 0, where K+1 isthe number of the first few taps exempt from deactivation.

[0280] A fourth embodiment of the present invention uses, as the errormetric, the change in the true MSE instead of the true MSE. In otherwords, the value of {new (true MSE)−old (true MSE)} is computed and usedas the error metric. In the fourth embodiment, the first test in block3016 is satisfied if the change in the true MSE is greater than aspecified change value (e.g., a value that corresponds to a noise tosignal ratio (NSR) change of 1 dB) and the power metric is smaller thanthe specified maximum power. The second test in block 3024 is satisfiedif the change in the true MSE is smaller than the specified change valueor the power metric is greater than the specified maximum power. Forexample, if the true MSE is at a value corresponding to a NSR of −25 dBbefore the tap power regulating process is applied, and if the specifiedchange value corresponds to a NSR change of 1 dB, then the final trueMSE will be at a value corresponding a NSR of about −24 dB. This fourthembodiment can be used when there is large external noise that is otherthan echo or NEXT noise. In such a case, the true MSE is large evenbefore the tap power regulation process is applied. Thus, if the trueMSE is used as the error metric, practically no taps will bedeactivated, resulting in large power dissipation. In this situation,since the large noise is not caused by the uncancelled echo and NEXTimpairments, a large number of taps could be deactivated without causingsignificant additional degradation to the system performance. The fourthembodiment allow these taps to be deactivated in this situation.

[0281] In a fifth embodiment, all of the tap's in a filter are initiallyactivated and converged, instead of being activated in blocks andconverged in stages as in the first embodiment. The flowchart of thefifth embodiment is similar to the one of the first embodiment shown inFIG. 30, except for the following two differences. The first differenceis that, in the activation block 3006, the block of taps is set toinclude all of the taps in the filter. The second difference is that theblock 3020 is not needed.

[0282] In each of the embodiments, there are several ways of computingthe error metric. The error metric can be computed as a measurement ofsystem performance degradation caused by the filter being considered, orby the four filters in the constituent transceiver being considered, orby all the 16 filters in the four constituent transceivers of thegigabit transceiver.

[0283] When computed as a measurement of degradation caused by all 4filters of the constituent transceiver being examined, the error metricprovides a good indication of the bit error rate of that constituenttransceiver.

[0284] In the case where the error metric is computed as a measurementof degradation caused by all the 16 filters in the 4 constituenttransceivers of the gigabit transceiver, the power regulation processcan allow the filters in one of the 4 transceivers to have larger errorand compensate for this error in the filters of the other 3transceivers. For example, if the echo/NEXT impairments in oneparticular transceiver are very severe and too many active, taps wouldbe needed to cancel them, then the power regulation process can allowthe impairments to stay severe in this transceiver, and allocate thepower resource to the other 3 transceivers instead. It is noted that, inthis case, the trellis decoder 38 still decodes correctly since it usessignal samples from all the four transceivers in its decoding scheme.

[0285] When applied to the echo and NEXT cancellers of the gigabittransceiver, for typical channels, the power regulation process of thepresent invention results in a large number of taps being deactivatedand the power consumption being reduced by a large factor. Simulationtests confirm this result.

[0286]FIG. 35 illustrates an exemplary impulse response of the echocharacteristic developed by a typical multi-pair transmission channel inresponse to a known impulse. FIG. 36 illustrates an exemplary impulseresponse of the near end crosstalk (NEXT) characteristics developed by atypical transmission channel in response to a similar known impulse.FIGS. 37A and 37B illustrate the results of simulation programmingperformed to evaluate the application of tap power regulationmethodologies to a local constituent transceiver and a remoteconstituent transceiver connected together through a transmissionchannel having the echo impulse response of FIG. 35.

[0287] During the initial period of communication, through a processknown as Auto-Negotiation, the two transceivers negotiate then agree ontheir respective status as Master and Slave. FIGS. 37A and 37B show theMSE to signal ratio expressed in dB as a function of time, with timeexpressed in bauds, for the Master and Slave transceivers, respectively.Each point on the graphs in FIGS. 37A and 37B is obtained by averagingthe instantaneous measurements taken over 10,000 symbol periods. Theerror metric MSE is computed based on the error signal 42A (in FIG. 2),i.e., the error as seen by the trellis decoder 38 (FIG. 2).

[0288] Referring to FIGS. 37A and 37B, during the time interval from 0baud to about 1.2×10⁵ bauds, the Master trains its own echo cancellerwhile transmitting with an independent, fixed clock. During this timeinterval, the Slave synchronizes to the signal transmitted by theMaster, and trains its feed-forward equalizer and its timing recoveryblock. During the time interval from about 1.2×10⁵ bauds to about2.2×10⁵ bauds, the Slave trains its echo canceller while transmitting.During this time interval, the Master is not transmitting, onlyreceiving from the Slave, and trains its feed-forward equalizer and itstiming recovery block to account for the delay in the channel. By theend of this time interval, the Master and Slave are synchronized witheach other.

[0289] During the time interval from about 2.2×10⁵ bauds to about3.2×10⁵ bauds, both the Master and Slave transmit and receive. Duringthis time interval, the Master retrains its echo canceller and readjusttiming. From about 3.2×10⁵ bauds, there is convergence of both Masterand Slave echo cancellers. At about 3.6×10⁵ bauds, the tap powerregulating process of the present invention is applied to both echocancellers, with the specified error, i.e., the maximum acceptablesystem performance degradation, set at a value corresponding to a NSR of−24 dB. As shown in FIGS. 37A and 37B, for both local and remotetransceivers, the MSE increases to and stays at this specified errorcorresponding to a NSR of −24 dB. In this example, in each constituenttransceiver, the echo canceller has initially 140 taps, and each of thethree NEXT cancellers has initially 100 taps. The total number ofinitial taps in each constituent transceiver is 440.

[0290]FIGS. 38A and 38B are graphs of the values of the tap coefficientsof the echo canceller as a function of the tap number, after applicationof the tap power regulation process with the specified error set atvalues corresponding to noise to signal ratio of −24 dB and −26 dB,respectively. The deactivated coefficients are shown as having valuezero.

[0291] Referring to FIG. 38A, the number of taps remaining active, afterapplication of the tap power regulation process with the specified errorcorresponding to a NSR of −24 dB, is 22. For this specified error, theremaining active taps for the three NEXT cancellers is 6, 2, and 0,respectively (not illustrated). Thus, out of a total of 440 initiallyactive taps in the constituent transceiver, only 30 remain active afterapplication of the process of the present invention, while a 5 dB marginis maintained for the required bit error rate.

[0292] Referring to FIG. 38B, after application of the tap powerregulation process with the specified error corresponding to a NSR of−26 dB, the number of taps remaining active is 47. For this specifiederror, the remaining active taps for the three NEXT cancellers (notillustrated) is 6, 2, and 0, respectively. Thus, out of a total of 440initially active taps in the constituent transceiver, only 55 remainactive after application of the process of the present invention, whilea 7 dB margin is maintained for the required bit error rate.

[0293]FIGS. 38A and 38B show that the surviving taps occur at sparselocations. This is due to the strong dependence of the echo/NEXTcancellers on the specific cable response. Since the responsecharacteristics of any given cable making up the transmission channelare not a priori determinable, it would be impossible, in practice, topredict and statically allocate the surviving taps during the design ofthe echo and NEXT cancellers. Therefore, some sort of dynamic active tapidentification and allocation process according to the invention offerssignificant power reduction benefits over conventional methodologies.

[0294] While the systems and methods of the invention have beendescribed mainly in terms of their applicability to adaptivelyconfiguring active tap sets for high order digital filters, the dynamicpower regulation methodology of the present invention can also beapplied to complete computation modules of a transceiver, in cases wherethe computational power of such modules is not needed for a particularapplication. In these cases, a similar methodology applies, i.e.,evaluate a signal performance metric of a signal output from acomputational module against a performance threshold and, where theperformance metric is greater than the threshold, power down thecomputational module.

[0295] This additional embodiment of the invention is particularlyadvantageous in cases where the transmission channel might beimplemented with short (<3 meters) cable lengths, resulting in therelative absence of transmission channel induced intersymbolinterference (ISI). Returning momentarily to the description of thetrellis decoder circuit accompanying FIG. 3, in the absence ofintersymbol interference, symbols received from the deskew memory 37need only be decoded by the Viterbi decoder 604, and its associatedmodules, i.e., the path metrics module 606, and the path memory module608, without resorting to a decision-feedback sequence estimationapproach, as discussed previously. In this case, the dynamic powerregulation process reduces the power consumption of the gigabittransceiver by deactivating and bypassing the computational modulesrepresented by the MDFE 602, the DFE 612 and the select logic 610. Sincereceived symbols are relatively uneffected by channel induced ISI, thereis no need to develop ISI compensation for incomming signal samplesprior to symbol decode, and therefore no need for ISI compensationcircuitry.

[0296]FIG. 39 is a simplified, semi-schematic block diagram of anexemplary trellis decoder 38 as it might be implemented in the casewhere it has been determined that there is substantially no channelinduced intersymbol interference. Referring to FIG. 39, the 4-D outputsignal 37 from the deskew memory 36 is provided directly to the Viterbidecoder 604, as the Viterbi input. In accordance with the invention, itshould be noted that, in the absence of intersymbol interference, only asingle 4-D Viterbi input is needed in contrast to the eight state inputsrequired in the full ISI compensation case.

[0297] As illustrated in FIG. 39, the DFE, MDFE and decoder circuitryhas been replaced by a series of simple delay stages and an addercircuit, with the deskew output signal (a signal sample) directly inputto the Viterbi decoder 604. The deskew output signal sample is alsodirected through a set of three series coupled sequential delay stages3920, 3922 and 3924 and then to an adder circuit 3926. Signal samplesare added to the negative of the first tentative decision V_(0F) outputby the path memory module 608 in the adder circuit 3926 in order todevelop an error term. The error term is directed through an additionaldelay stage 3928 after which the error term 42 might be directed to anadaptive gain stage (34 of FIG. 2) and timing recovery circuit (222 ofFIG. 2). In the exemplary embodiment shown in FIG. 39, the 4-D error 42is computed as the delayed difference between the delayed 4-D input 37and the 4-D output V_(0F) of the path memory module 608. Thecorresponding 4-D tentative decision 44 may be represented as nothingmore than a delayed version of the 4-D output V_(0F) of the path memorymodule 608; the delay occuring in an additional delay stage 3930. In theembodiment shown in FIG. 39, the error and tentative decision delayelements 3928 and 3930, respectively, are used to ensure that the error42 and the tentative decision 44 arrive at the timing recovery block(222 of FIG. 2) at the same time. Depending on the design andimplementation of the timing recovery block, these delay elements maynot necessarily be needed in alternative embodiments.

[0298]FIG. 40 illustrates yet a further embodiment of the inventionwhich is particularly advantageous in situations where thesignal-to-noise ratio is very high (as may happen with a short cable,e.g., of less than 50 meters). In such situations, the coding gainprovided by the trellis code may not be needed, and adequate systemperformance, as indicated by the bit error rate, may be achieved withoutmaking use of this coding gain. In these situations, substantial powerdissipation reductions can be achieved by disabling the trellis decodeenabling features of the complex Viterbi decoder, including the Viterbidecoder block 604, its associated path metric and path memory modules606 and 608, and a large portion of the ISI compensation circuitryincluding the MDFE 302 and the select logic 610. These portions arereplaced, or substituted, with a simple symbol-by-symbol decoder and asimple decision feedback equalizer to detect the received signal,instead of using the computationally complex Viterbi decoder.

[0299] Referring to FIG. 40, signal samples output by the deskew memoryare directed through an adder circuit 4032, which determines thedifference between the input signal samples and the 4-D output of a DFE4034. A symbol-by-symbol decoder 4036 receives the difference betweenthe 4-D signal samples and the 4-D output from the DFE 4034 and decodesit. A 4-D tentative decision 44 is taken directly from the output of thesymbol-by-symbol decoder 4036, and an error term 42 is developed by anadditional adder circuit 4038, coupled to define the difference betweenthe input and the output of the symbol-by-symbol decoder 4036. A softdecision 43, which is used for display purposes only, is taken directlyfrom the input of the symbol-by-symbol decoder 4036.

[0300] Final decisions are developed by delaying the output of thesymbol-by-symbol decoder through three series coupled sequential delaystages 4040, 4042 and 4044. The output of each respective delay stage isdirected to the DFE as a corresponding tentative decision V_(0F), V_(1F)and V_(2F).

[0301] In each of the cases described in connection with FIGS. 39 and40, it will be understood that the surviving elements of the decodersection are all present in a fully functional Viterbi decoder systemwith ISI compensation. Such a system is described in co-pending U.S.patent application entitled System and Method for High-Speed Decodingand ISI Compensation in a Multi-Pair Transceiver System, filed oninstant date herewith and commonly owned by the assignee of the presentinvention, the entire contents of which are expressly incorporated byreference. As decisions are made with regard to the desirability ofmaintaining the circuitry in a fully operational condition or truncatingcertain computational sections in an effort to reduce power dissipation,the system need only remove power from certain identified portions ofthe circuitry, with other identified portions allowed to remainpowered-up in the active signal path. No additional component circuitelements need be provided.

[0302] The dynamic power regulation methodology of the present inventioncan also be applied to any other component module of a communicationsystem, so long as that module is able to provide a given minimal levelof performance with a truncated functional representation or withtruncated circuitry. Of course, such minimal performance levels willobtain in only certain situations and are dependent on external factors,particularly the transmission channel characteristics. However, thesesituations frequently appear in a substantial number of applications orinstallations. An integrated circuit transceiver capable of adaptivelyconfiguring itself to provide a “just sufficient” level of performancewhile operating at the lowest obtainable power dissipation levels wouldlend itself to almost universal application.

[0303] The present invention further provides a method and a timingrecovery system for generating a set of clock signals in a processingsystem. The set of clock signals includes a set of sampling clocksignals. The processing system includes a set of processing subsystems,each of which includes an analog section. Each of the analog sectionsoperates in accordance with a corresponding sampling clock signals. Anexample of the processing system is a gigabit transceiver. In this case,the processing subsystems are the constituent transceivers.

[0304] The present invention can be used to generate and distributeclock signals in a gigabit transceiver of a Gigabit Ethernetcommunication system such that effect of switching noise coupled fromone clock domain to another clock domain is minimized. By “clockdomain”, it is meant the circuit blocks that are operating according totransitions of a particular clock signal. For ease of explanation, thepresent invention will be described in detail as applied to thisexemplary application. However, this is not to be construed as alimitation of the present invention.

[0305] In order to appreciate the advantages of the present invention,it will be beneficial to describe the invention in the context of anexemplary bi-directional communication device, such as an Ethernettransceiver. The particular exemplary implementation chosen is depictedin FIG. 1, which is a simplified block diagram of a multi-paircommunication system operating in conformance with the IEEE 802.3abstandard (also termed 1000BASE-T) for 1 gigabit (Gb/s) Ethernetfull-duplex communication over four twisted pairs of Category-5 copperwires.

[0306] In FIG. 1, the communication system is represented as apoint-to-point system in order to simplify the explanation, and includestwo main transceiver blocks 102 and 104, coupled together via fourtwisted-pair cables 112 a, b, c and d. Each of the wire pairs 112 a, b,c, d is coupled to each of the transceiver blocks 102, 104 through arespective one of four line interface circuits 106. Each of the wirepairs 112 a, b, c, d facilitates communication of information betweencorresponding pairs of four pairs of transmitter/receiver circuits(constituent transceivers) 108. Each of the constituent transceivers 108is coupled between a respective line interface circuit 106 and aPhysical Coding Sublayer (PCS) block 110. At each of the transceiverblocks 102 and 104, the four constituent transceivers 108 are capable ofoperating simultaneously at 250 megabits of information data per second(Mb/s) each, and are coupled to the corresponding remote constituenttransceivers through respective line interface circuits to facilitatefull-duplex bi-directional operation. Thus, 1 Gb/s communicationthroughput of each of the transceiver blocks 102 and 104 is achieved byusing four 250 Mb/s (125 Mbaud at 2 information data bits per symbol)constituent transceivers 108 for each of the transceiver blocks 102, 104and four pairs of twisted copper cables to connect the two transceiverblocks 102, 104 together.

[0307] The exemplary communication system of FIG. 1 has a superficialresemblance to a 100BASE-T4 system, but is configured to operate at tentimes the bit rate. As such, it should be understood that certain systemperformance characteristics, such as sampling rates and the like, willbe consequently higher and cause a greater degree of power consumption.Also, at gigabit data rates over potentially noisy channels, aproportionately greater degree of signal processing is required in manyinstances to insure an adequate degree of signal fidelity and quality.

[0308]FIG. 2 is a simplified block diagram of the functionalarchitecture and internal construction of an exemplary transceiverblock, indicated generally at 200, such as transceiver 102 of FIG. 1.Since the illustrative transceiver application relates to gigabitEthernet transmission, the transceiver will be referred to as the“gigabit transceiver”. For ease of illustration and description, FIG. 2shows only one of the four 250 Mb/s constituent transceivers which areoperating simultaneously (termed herein 4-D operation). However, sincethe operation of the four constituent transceivers are necessarilyinterrelated, certain blocks and signal lines in the exemplaryembodiment of FIG. 2 perform four-dimensional operations and carryfour-dimensional (4-D) signals, respectively. By 4-D, it is meant thatthe data from the four constituent transceivers are used simultaneously.In order to clarify signal relationships in FIG. 2, thin linescorrespond to 1-dimensional functions or signals (i.e., relating to onlya single constituent transceiver), and thick lines correspond to 4-Dfunctions or signals (relating to all four constituent transceivers).

[0309] Referring to FIG. 2, the gigabit transceiver 200 includes aGigabit Medium Independent Interface (GMII) block 202 subdivided into areceive GMII circuit 202R and a transmit GMII circuit 202T. Thetransceiver also includes a Physical Coding Sublayer (PCS) block 204,subdivided into a receive PCS circuit 204R and a transmit PCS circuit204T, a pulse shaping filter 206, a digital-to analog (D/A) converterblock 208, and a line interface block 210, all generally encompassingthe transmitter portion of the transceiver.

[0310] The receiver portion generally includes a highpass filter 212, aprogrammable gain amplifier (PGA) 214, an analog-to-digital (A/D)converter 216, an automatic gain control (AGC) block 220, a timingrecovery block 222, a pair-swap multiplexer block 224, a demodulator226, an offset canceller 228, a near-end crosstalk (NEXT) cancellerblock 230 having three constituent NEXT cancellers and an echo canceller232.

[0311] The gigabit transceiver 200 also includes an A/Dfirst-in-first-out buffer (FIFO) 218 to facilitate proper transfer ofdata from the analog clock region to the receive clock region, and aloopback FIFO block (LPBK) 234 to facilitate proper transfer of datafrom the transmit clock region to the receive clock region. The gigabittransceiver 200 can optionally include an additional adaptive filter tocancel far-end crosstalk noise (FEXT canceller).

[0312] In operational terms, on the transmit path, the transmit section202T of the GMII block receives data from the Media Access Control (MAC)module in byte-wide format at the rate of 125 MHz and passes them to thetransmit section 204T of the PCS block via the FIFO 201. The FIFO 201ensures proper data transfer from the MAC layer to the Physical Coding(PHY) layer, since the transmit clock of the PHY layer is notnecessarily synchronized with the clock of the MAC layer. In oneembodiment, this small FIFO 201 has from about three to about fivememory cells to accommodate the elasticity requirement which is afunction of frame size and frequency offset.

[0313] The PCS transmit section 204T performs certain scramblingoperations and, in particular, is responsible for encoding digital datainto the requisite codeword representations appropriate fortransmission. In the illustrated embodiment of FIG. 2, the transmit PCSsection 204T incorporates a coding engine and signal mapper thatimplements a trellis coding architecture, such as required by the IEEE802.3ab specification for gigabit transmission.

[0314] In accordance with this encoding architecture, the PCS transmitsection 204T generates four 1-D symbols, one for each of the fourconstituent transceivers. The 1-D symbol generated for the constituenttransceiver depicted in FIG. 2 is filtered by the pulse shaping filter206. This filtering assists in reducing the radiated emission of theoutput of the transceiver such that it falls within the parametersrequired by the Federal Communications Commission. The pulse shapingfilter 206 is implemented so as to define a transfer function of0.75+0.25z⁻¹. This particular implementation is chosen so that the powerspectrum of the output of the transceiver falls below the power spectrumof a 100Base-Tx signal. The 100Base-Tx is a widely used and acceptedFast Ethernet standard for 100 Mb/s operation on two pairs of Category-5twisted pair cables. The output of the pulse shaping filter 206 isconverted to an analog signal by the D/A converter 208 operating at 125MHz. The analog signal passes through the line interface block 210, andis placed on the corresponding twisted pair cable.

[0315] On the receive path, the line interface block 210 receives ananalog signal from the twisted pair cable. The received analog signal ispreconditioned by the highpass filter 212 and the PGA 214 before beingconverted to a digital signal by the A/D converter 216 operating at asampling rate of 125 MHz. The timing of the A/D converter 216 iscontrolled by the output of the timing recovery block 222. The resultingdigital signal is properly transferred from the analog clock region tothe receive clock region by the A/D FIFO 218. The output of the A/D FIFO218 is also used by the AGC 220 to control the operation of the PGA 214.

[0316] The output of the A/D FIFO 218, along with the outputs from theA/D FIFOs of the other three constituent transceivers are inputted tothe pair-swap multiplexer block 224. The pair-swap multiplexer block 224uses the 4-D pair-swap control signal from the receive section 204R ofPCS block to sort out the four input signals and send the correctsignals to the respective feedforward equalizers 26 of the demodulator226. This pair-swapping control is needed for the following reason. Thetrellis coding methodology used for the gigabit transceivers (102 and104 of FIG. 1) is based on the fact that a signal on each twisted pairof wire corresponds to a respective 1-D constellation, and that thesignals transmitted over four twisted pairs collectively form a 4-Dconstellation. Thus, for the decoding to work, each of the four twistedpairs must be uniquely identified with one of the four dimensions. Anyundetected swapping of the four pairs would result in erroneousdecoding. In an alternate embodiment of the gigabit transceiver, thepair-swapping control is performed by the demodulator 226, instead ofthe combination of the PCS receive section 204R and the pair-swapmultiplexer block 224.

[0317] The demodulator 226 includes a feed-forward equalizer (FFE) 26for each constituent transceiver, coupled to a deskew memory circuit 36and a decoder circuit 38, implemented in the illustrated embodiment as atrellis decoder. The deskew memory circuit 36 and the trellis decoder 38are common to all four constituent transceivers. The FFE 26 receives thereceived signal intended for it from the pair-swap multiplexer block224. The FFE 26 is suitably implemented to include a precursor filter28, a programmable inverse partial response (IPR) filter 30, a summingdevice 32, and an adaptive gain stage 34. The FFE 26 is aleast-mean-squares (LMS) type adaptive filter which is configured toperform channel equalization as will be described in greater detailbelow.

[0318] The precursor filter 28 generates a precursor to the input signal2. This precursor is used for timing recovery. The transfer function ofthe precursor filter 28 might be represented as −g+z⁻¹, with g equal to{fraction (1/16)} for short cables (less than 80 meters) and ⅛ for longcables (more than 80 m). The determination of the length of a cable isbased on the gain of the coarse PGA 14 of the programmable gain block214.

[0319] The programmable IPR filter 30 compensates the ISI (intersymbolinterference) introduced by the partial response pulse shaping in thetransmitter section of a remote transceiver which transmitted the analogequivalent of the digital signal 2. The transfer function of the IPRfilter 30 may be expressed as 1/(1+Kz⁻¹). In the present example, K hasan exemplary value of 0.484375 during startup, and is slowly ramped downto zero after convergence of the decision feedback equalizer includedinside the trellis decoder 38. The value of K may also be any positivevalue strictly less than 1.

[0320] The summing device 32 receives the output of the IPR filter 30and subtracts therefrom adaptively derived cancellation signals receivedfrom the adaptive filter block, namely signals developed by the offsetcanceller 228, the NEXT cancellers 230, and the echo canceller 232. Theoffset canceller 228 is an adaptive filter which generates an estimateof signal offset introduced by component circuitry of the transceiver'sanalog front end, particularly offsets introduced by the PGA 214 and theA/D converter 216.

[0321] The three NEXT cancellers 230 may also be described as adaptivefilters and are used, in the illustrated embodiment, for modeling theNEXT impairments in the received signal caused by interference generatedby symbols sent by the three local transmitters of the other threeconstituent transceivers. These impairments are recognized as beingcaused by a crosstalk mechanism between neighboring pairs of cables,thus the term near-end crosstalk, or NEXT. Since each receiver hasaccess to the data transmitted by the other three local transmitters, itis possible to approximately replicate the NEXT impairments throughfiltering. Referring to FIG. 2, the three NEXT cancellers 230 filter thesignals sent by the PCS block to the other three local transmitters andproduce three signals replicating the respective NEXT impairments. Bysubtracting these three signals from the output of the IPR filter 30,the NEXT impairments are approximately cancelled.

[0322] Due to the bi-directional nature of the channel, each localtransmitter causes an echo impairment on the received signal of thelocal receiver with which it is paired to form a constituenttransceiver. In order to remove this impairment, an echo canceller 232is provided, which may also be characterized as an adaptive filter, andis used, in the illustrated embodiment, for modeling the signalimpairment due to echo. The echo canceller 232 filters the signal sentby the PCS block to the local transmitter associated with the receiver,and produces an approximate replica of the echo impairment. Bysubtracting this replica signal from the output of the IPR filter, 30,the echo impairment is approximately cancelled.

[0323] The adaptive gain stage 34 receives the processed signal from thesumming circuit 32 and fine tunes the signal path gain using azero-forcing LMS algorithm. Since this adaptive gain stage 34 trains onthe basis of error signals generated by the adaptive filters 228, 230and 232, it provides a more accurate signal gain than the one providedby the PGA 214 in the analog section.

[0324] The output of the adaptive gain stage 34, which is also theoutput of the FFE 26, is inputted to the deskew memory circuit 36. Thedeskew memory 36 is a four-dimensional function block, i.e., it alsoreceives the outputs of the three FFEs of the other three constituenttransceivers. There may be a relative skew in the outputs of the fourFFEs, which are the four signal samples representing the four symbols tobe decoded. This relative skew can be up to 50 nanoseconds, and is dueto the variations in the way the copper wire pairs are twisted. In orderto correctly decode the four symbols, the four signal samples must beproperly aligned. The deskew memory aligns the four signal samplesreceived from the four FFEs, then passes the deskewed four signalsamples to a decoder circuit 38 for decoding.

[0325] In the context of the exemplary embodiment, the data received atthe local transceiver was encoded before transmission, at the remotetransceiver. In the present case, data might be encoded using an 8-statefour-dimensional trellis code, and the decoder 38 might therefore beimplemented as a trellis decoder. In the absence of intersymbolinterference (ISI), a proper 8-state Viterbi decoder would provideoptimal decoding of this code. However, in the case of Gigabit Ethernet,the Category-5 twisted pair cable introduces a significant amount ofISI. In addition, the partial response filter of the remote transmitteron the other end of the communication channel also contributes-some ISI.Therefore, the trellis decoder 38 must decode both the trellis code andthe ISI, at the high rate of 125 MHz. In the illustrated embodiment ofthe gigabit transceiver, the trellis decoder 38 includes an 8-stateViterbi decoder, and uses a decision-feedback sequence estimationapproach to deal with the ISI components.

[0326] The 4-D output of the trellis decoder 38 is provided to the PCSreceive section 204R. The receive section 204R of the PCS blockde-scrambles and decodes the symbol stream, then passes the decodedpackets and idle stream to the receive section 202T of the GMII blockwhich passes them to the MAC module. The 4-D outputs, which are theerror and tentative decision, respectively, are provided to the timingrecovery block 222, whose output controls the sampling time of the A/Dconverter 216. One of the four components of the error and one of thefour components of the tentative decision correspond to the receivershown in FIG. 2, and are provided to the adaptive gain stage 34 of theFFE 26 to adjust the gain of the equalizer signal path. The errorcomponent portion of the decoder output signal is also provided, as acontrol signal, to adaptation circuitry incorporated in each of theadaptive filters 230 and 232. Adaptation circuitry is used for theupdating and training process of filter coefficients.

[0327] For the exemplary gigabit transceiver system 200 described aboveand shown in FIG. 2, there is a PHY Control system (not shown) whichprovides control signals to the blocks shown in FIG. 2, including thetiming recovery block 222, to control their functions.

[0328] For the exemplary gigabit transceiver system 200 described aboveand shown in FIG. 2, there are design considerations regarding theallocation of boundaries of the clock domains. These designconsiderations are dependent on the clocking relationship betweentransmitters and receivers in a gigabit transceiver. Therefore, thisclocking relationship will be discussed first.

[0329] During a bidirectional communication between two gigabittransceivers 102, 104 (FIG. 1), through a process called“auto-negotiation”, one of the gigabit transceivers assumes the role ofthe master while the other assumes the role of the slave. When a gigabittransceiver assumes one of the two roles with respect to the remotegigabit transceiver, each of its constituent transceivers assumes thesame role with respect to the corresponding one of the remoteconstituent transceivers. Each constituent transceiver 108 isconstructed such that it can be dynamically configured to act as eitherthe master or the slave with respect to a remote constituent transceiver108 during a bidirectional communication. The clocking relationshipbetween the transmitter and receiver inside the constituent transceiver108 depends on the role of the constituent transceiver (i.e., master orslave) and is different for each of the two cases.

[0330]FIG. 19 illustrates the general clocking relationship on theconceptual level between the transmitter and the receiver of the gigabitEthernet transceiver (102 or 104) of FIG. 1. For this conceptual FIG.19, the transmitter TX represents the four constituent transmitters andthe receiver RX represents the four constituent receivers.

[0331] Referring to FIG. 19, the gigabit transceiver 1901 acts as themaster while the gigabit transceiver 1902 acts as the slave. The master1901 includes a transmitter 1910 and a receiver 1912. The slave 1902includes a transmitter 1920 and a receiver 1922. The transceiver 1901(respectively, 1902) receives from the GMII 202T (FIG. 2) the data to betransmitted TXD via its input 1913 (respectively, 1923), and the GMIItransmit clock GTX_CLK (this clock is also called “gigabit transmitclock” in the IEEE 802.3ab standard) via its input 1915 (respectively,1925). The transceiver 1901 (respectively, 1902) sends to the GMII 202R(FIG. 2) the received data RXD via its output 1917 (respectively, 1927),and the GMII receive clock RX_CLK (this clock is also called “gigabitreceive clock” in the IEEE 802.3ab standard) via its output 1919(respectively, 1929). It is noted that the clocks GTX_CLK and RX_CLK maybe different from the transmit clock TCLK and receive clock RCLK,respectively, of a gigabit transceiver.

[0332] The receiver 1922 of the slave 1902 synchronizes its receiveclock to the transmit clock of the transmitter 1910 of the master 1901in order to properly receive the data transmitted by the transmitter1910. The transmit clock of the transmitter 1920 of the slave 1902 isessentially the same as the receive clock of the receiver 1922, thus itis also synchronized to the transmit clock of the transmitter 1910 ofthe master 1901.

[0333] The receiver 1912 of the master 1901 is synchronized to thetransmit clock of the transmitter 1920 of the slave 1902 in order toproperly receive data sent by the transmitter 1920. Because of thesynchronization of the receive and transmit clocks of the slave 1902 tothe transmit clock of transmitter 1910 of the master 1901, the receiveclock of the receiver 1912 is synchronized to the transmit clock of thetransmitter 1910 with a phase delay (due to the twisted pairs ofcables). Thus, in the absence of jitter, after synchronization, thereceive clock of receiver 1912 tracks the transmit clock of transmitter1910 with a phase delay. In other words, in principle, the receive clockof receiver 1912 has the same frequency as the transmit clock oftransmitter 1910, but with a fixed phase delay.

[0334] However, in the presence of jitter or a change in the cableresponse, these two clocks may have different instantaneous frequencies(frequency is derivative of phase with respect to time). This is due tothe fact that, at the master 1901, the receiver 1912 needs todynamically change the relative phase of its receive clock with respectto the transmit clock of transmitter 1910 in order to track jitter inthe incoming signal from the transmitter 1920 or to compensate for thechange in cable response. Thus, in practice, the transmit and receiveclocks of the master 1901 may be actually independent. At the master,this independence creates an asynchronous boundary between the transmitclock domain and the receive clock domain. By “transmit clock domain”,it is meant the region where circuit blocks are operated in accordancewith transitions in the transmit clock signal TCLK. By “receive clockdomain”, it is meant the region where circuit blocks are operated inaccordance with transitions in the receive clock signal RCLK. In orderto avoid any loss of data when data cross the asynchronous boundarybetween the transmit clock domain and the receive clock domain insidethe master 1901, FIFOs are used at this asynchronous boundary. For theexemplary structure of the gigabit transceiver shown in FIG. 2, FIFOs234 (FIG. 2) are placed at this asynchronous boundary. Since aconstituent transceiver 108 (FIG. 1) is constructed such that it can beconfigured as a master or a slave, the FIFOs 234 (FIG. 2) are alsoincluded in the slave 1902 (FIG. 19).

[0335] At the slave 1902, the transmit clock TCLK of transmitter 1920 isphase locked to the receive clock RCLK of receiver 1922. Thus, TCLK maybe different from GTX_CLK, a FIFO 1930 is needed for proper transfer ofdata TXD from the MAC (not shown) to the transmitter 1920. The depth ofthe FIFO 1930 must be sufficient to absorb any loss during the length ofa data packet. The multiplexer 1932 allows to use either the GTX_CLK orthe receive clock RCLK of receiver 1922 as the signal RX_CLK 1929. Whenthe GTX_CLK is used as the RX_CLK 1929, the FIFO 1934 is needed toensure proper transfer of data RXD 1927 from the receiver 1922 to theMAC.

[0336] For the conceptual block diagram of FIG. 19, there are onetransmit clock TCLK and one receive clock RCLK for a gigabittransceiver. The transmit clock TCLK is common to all four constituenttransceivers since data transmitted simultaneously on all four twistedpairs of cable correspond to 4D symbols. Since data received from thefour twisted pairs of cable are to be decoded simultaneously into 4Dsymbols, it is an efficient design to have all the digital processingblocks clocked by one clock signal RCLK. However, due the differentcable responses of the four twisted pairs of cable, the A/D converter216 (FIG. 2) of each of the four constituent transceivers requires adistinct sampling clock signal. Thus, in addition to the signals TCLKand RCLK, the gigabit transceiver system 200 requires four samplingclock signals.

[0337] There is an alternative structure for the gigabit transceiverwhere the partition of clock domains is different than the one shown inFIG. 2. This alternative structure (not shown explicitly) is similar tothe one shown in FIG. 2 and only differs in that its transmit clockdomain includes both the transmit clock domain and the receive clockdomain of FIG. 2, and that the FIFO block 234 is not needed. In otherwords, in this alternative structure, the receive clock RCLK is the sameas the transmit clock TCLK, and the transmit clock TCLK is used to clockboth the transmitter and most of the receiver. The advantage of thisalternative structure is that there is no asynchronous boundary betweenthe transmit region and most of the receive region, thus allowing theecho canceller 232 and NEXT cancellers 230 to work with only one clocksignal. The disadvantage of this alternative structure is that there isa potential for a performance penalty at the master when the constituenttransceivers are tracking jitter. As a result of tracking jitter, therelative phase of a sampling clock signal with respect to the transmitclock TCLK may vary dynamically. This could cause the A/D converter tosample at noisy instants where transistors in circuit blocks operatingaccording to the clock signal TCLK are switching. Thus, the alternativestructure is not as good as the structure shown in FIG. 2, with respectto the switching noise problem.

[0338]FIG. 20 is a simplified block diagram of an embodiment of thetiming recovery system constructed according to the present inventionand applied to the gigabit transceiver architecture of FIG. 2. Thetiming recovery system 222 (FIGS. 2 and 3) generates the different clocksignals for the exemplary gigabit transceiver shown in FIG. 2, namely,the sampling clock signals ACLK0, ACLK1, ACLK2, ACLK3, the receive clocksignal RCLK, and the transmit clock signal TCLK.

[0339] The timing recovery system 222 includes a set of phase detectors2002, 2012, 2022, 2032, a set of loop filters 2006, 2016, 2026, 2036, aset of numerically controlled oscillators (NCO) 2008, 2018, 2028, 2038and a set of phase selectors 2010, 2020, 2030, 2040, 2050, 2060. Theadders 2004, 2014, 2024, 2034 are shown for conceptual illustrationpurpose only. In practice, these adders are implemented within therespective phase detectors 2002, 2012, 2022, 2032. The RCLK Offset isused to adjust the phase of the receive clock signal RCLK in order toreduce the effects of switching noise on the sampling operations of thecorresponding A/D converters 216 (FIG. 2). Three of the four signalsACLK0 Offset, ACLK1 Offset, ACLK2 Offset, ACLK3 Offset are used toslightly adjust the phases of the respective sampling clocks ACLK0through ACLK4 in order to further reduce these effects of switchingnoise. The phase adjustments of the receive clock RCLK and the samplingclocks ACLK0-3 are not a necessary function of the timing recoverysystem 222. However, the method and system for generating these phaseadjustment signals constitute another novel aspect of the presentinvention and will be described in detail later.

[0340] Each of the phase detectors 2002, 2012, 2022, 2032 receives thecorresponding 1D component of the 4D slicer error 42 (FIGS. 2 and 3) andthe corresponding 1D component of the 4D tentative decision 44 (FIGS. 2and 3) from the decoder 38 (FIG. 2) to generate a corresponding phaseerror. The phase errors 0 through 3 are inputted to the loop filters2006, 2016, 2026, 2036, respectively. The loop filters 2006, 2016, 2026,2036 generate and output filtered phase errors to the NCOs 2008, 2018,2028, 2038. The loop filters 2006, 2016, 2026, 2036 can be of any order.In one embodiment, the loop filters are of second order. The NCOs 2008,2018, 2028, 2038 generate phase control signals from the filtered phaseerrors. The phase selectors 2010, 2020, 2030, 2040 receive correspondingphase control signals from the NCOs 2008, 2018, 2028, 2038,respectively. Each of the phase selectors 2010, 2020, 2030, 2040 selectsone out of several phases of the multi-phase signal 2070 based on thevalue of the corresponding phase control signal, and outputs thecorresponding sampling clock signal. In one embodiment of the invention,the multi-phase signal has 64 phases.

[0341] The multi-phase signal 2070 is generated by a clock generator2080. In the exemplary embodiment illustrated in FIG. 20, the clockgenerator 2080 includes a crystal oscillator 2082, a frequencymultiplier 2084 and an 8-phase ring oscillator 2086. The crystaloscillator 2082 produces a 25 MHz clock signal. The frequency multiplier2084 multiplies the frequency of the 25 MHz clock signal by 40 andproduces a 1 GHz clock signal. From the 1 GHz clock signal, the 8-phasering oscillator 586 produces the 8 GHz 64-phase signal 2070.

[0342] The receive clock signal RCLK, which is used to clock all thecircuit blocks in the receive clock domain (which include all thedigital signal processing circuit blocks in FIG. 2), can be generatedindependently of the sampling clock signals ACLK0 through ACLK3.However, for design efficiency, RCLK is chosen to be related to one ofthe sampling clock signals ACLK0 through ACLK3. For the exemplaryembodiment illustrated in FIG. 20, the receive clock signal RCLK isrelated to the sampling clock signal ACLK0. The receive clock signalRCLK is generated by inputting the sum of the phase control signaloutputted from the NCO 2008 and the RCLK Offset via an adder 2042 to thephase selector 2050. Based on this sum, the phase selector 2050 selectsone of the 64 phases of the multi-phase signal 2070 and outputs thereceive clock signal RCLK. Thus, when the RCLK Offset is zero, thereceive clock signal RCLK is the same as the sampling clock ACLK0.

[0343] As discussed previously in relation to FIG. 19, when theconstituent transceiver is configured as the master, its transmit clockTCLK is practically independent of its receive clock RCLK. In FIG. 20,when the constituent transceiver is the master, the transmit clocksignal TCLK is generated by inputting the signal TCLK Offset, generatedby the PHY Control system of the gigabit transceiver, to the phaseselector 2060. Based on the TCLK Offset, the phase selector 2060 selectsone of the 64 phases of the multi-phase signal 2070 and produces thetransmit clock signal TCLK. When the constituent transceiver is theslave, the transmit clock signal TCLK is generated by inputting the sumof the output of the NCO 2008 and the signal TCLK Offset, via the adder2042, to the phase selector 2060. Based on this sum, the phase selector2060 selects one of the 64 phases of the multi-phase signal 2070 andproduces the transmit clock signal TCLK. Thus, at the slave, thetransmit clock signal TCLK and the receive clock signal RCLK arephase-locked (as discussed previously in relation to FIG. 19).

[0344] It is important to note that, referring to FIG. 20, the functionperformed by the combination of an NCO (2008, 2018, 2028, 2038) followedby a phase selector (2110, 2120, 2130, 2140, 2150, 2160) can beimplemented by analog circuitry. The analog circuitry can be describedas follows. Each of the filtered phase errors outputted from the loopfilters (2006, 2016, 2026, 2036) would be inputted to a D/A converter tobe converted to analog form. Each of the analog filtered phase errorswould then be inputted to a voltage-controlled oscillator (VCO). TheVCOs would produce the clock signals. The VCOs can be implemented withwell-known analog techniques such as those using varactor diodes.

[0345]FIG. 21 is a block diagram illustrating a detailed implementationof the phase detectors 2002, 2012, 2022, 2032, the loop filters 2006,2016, 2026, 2036, and the NCOs 2008, 2018, 2028, 2038 of FIG. 20.

[0346] It is important to note that the 4D path connecting the phasedetectors 2002, 2012, 2022, 2032, the loop filters 2006, 2016, 2026,2036, the NCOs 2008, 2018, 2028, 2038 and the phase selectors 2010,2020, 2030, 2040 (FIG. 20) can be thought of as the 4D forward path of aphase locked loop whose 4D feedback path goes from, referring now toFIG. 2, the A/D converters 216 to the demodulator 226 then back to thetiming recovery 222. The input to this phase locked loop is actuallyphase information embedded in the slicer error 42 and tentative decision44, and the phase locked loop output is the phases of the sampling clocksignals. This phase locked loop is digital but can be approximated by acontinuous-time phase locked loop for practical design analysis purpose,as long as the sampling rate is much larger than the bandwidth of theloop. The theoretical transfer function of a continuous-timesecond-order phase locked loop is:$\frac{\Phi (s)}{\Theta (s)} = \frac{{K_{L} \cdot s} + {K_{L} \cdot K_{1}}}{s^{2} + {K_{L} \cdot s} + {K_{L} \cdot K_{1}}}$

[0347] where the transfer function of the loop filter is:${L(s)} = {{K_{L} \cdot ( {1 + \frac{K_{1}}{s}} )} = {K_{v} \cdot K_{d} \cdot ( {1 + \frac{K_{1}}{s}} )}}$

[0348] where K_(v) is the gain of the voltage-controlled oscillator,K_(d) is the gain of the phase detector, K_(L)=K_(v)·K_(d) and K₁ is thegain of the integrator inside the loop filter. For the digital phaselocked loop of the present invention, the gain parameters K_(v) and K₁can be computed from the word lengths and scale factors used inimplementing the NCO and the integrator of the loop filter. However, thegain of the phase detector K_(d) is more conveniently computed bysimulation. The gain parameters are used for the design and analysis ofthe digital phase locked loop.

[0349]FIG. 21 shows a phase detector 2110, a first filter 2130, a secondfilter 2150, an adder 2160 and an NCO 2170. The phase detector 2110 isan exemplary embodiment of the phase detectors 2002, 2012, 2022, 2032 ofFIG. 20. The combination of the first filter 2130, the second filter2150 and the adder 2160 is an exemplary embodiment of the loop filters2006, 2016, 2026, 2036 of FIG. 20. The NCO 2170 is an exemplaryembodiment of the NCOs 2008, 2018, 2028, 2038 of FIG. 20.

[0350] In FIGS. 21 through 23, the numbers in the form “Sn.k” indicatethe format of the data, where S denotes a signed number, “n” denotes thetotal number of bits and “k” denotes the number of bits after thedecimal point.

[0351] The phase detector 2110 includes a lattice structure having twodelay elements 2112, 2118, two multipliers 2114, 2120 and an adder 2122.The phase detector 2110 receives as inputs the corresponding 1Dcomponent of the 4D slicer error 42 (FIGS. 2 and 3) and thecorresponding 1D component of the 4D tentative decision 44 (FIGS. 2 and3) from the trellis decoder 38 (FIGS. 2 and 3). For simplicity, in FIG.21, these two 1D components are labeled as 42A and 44A, respectively. Itis understood that, for the phase detector of each of the fourconstituent transceivers of the gigabit transceiver, a distinct 1Dcomponent of the slicer error 42 and a distinct 1D component of thetentative decision 44 are used as inputs. On the upper branch of thelattice structure, the slicer error 42 is delayed by one unit of time(here, one symbol period) via the delay element 2112, then multiplied bythe tentative decision 44A to produce a pre-cursor phase error 2115. Thepre-cursor phase error 2115, when accumulated over time, represents thecorrelation between a past slicer error and a present tentativedecision, thus indicates the sampling phase error with respect to thezero-crossing point at the start of the signal pulse (this zero-crossingpoint is part of the pre-cursor introduced by design to the signal pulseby the precursor filter 28 of the FFE 26 in FIG. 2). On the lower branchof the lattice structure, the tentative decision 44A is delayed by oneunit of time via the delay element 2118, then multiplied by the slicererror 42A to produce a post-cursor phase error 2121.

[0352] The post-cursor phase error 2121, when accumulated over time,represents the correlation between a present slicer error and a pasttentative decision, thus indicates the sampling phase error with respectto the level-crossing point in the tail end of the signal pulse. In oneembodiment, this level-crossing point is determined by the first tapcoefficient of the DFE 312 of FIG. 3. At the zero-crossing point at thestart of the signal pulse, the slope of the signal pulse is positive,while at the level-crossing point at the tail end of the signal pulse,the slope of the signal pulse is negative. Thus, the pre-cursor phaseerror 2115 and the post-cursor phase error 2121 must be combined withopposite signs in the adder 2122. The combination of the pre-cursor 2115and post-cursor phase errors 2121 produces the phase error associatedwith one of the sampling clock signals ACLK0-ACLK3. This is the phaseerror indicated as one of the phase errors 0 through 3 in FIG. 20.

[0353] The phase offset 2102 is one of the sampling clock offset signalsACLK0 Offset through ACLK3 Offset in FIG. 20. The phase offset 2102,when needed, is generated by the PHY Control system of the gigabittransceiver. The phase offset 2102 is delayed by one unit of time thenis added to the combination of the pre-cursor error 2115 and post-cursor2121 via the adder 2122 to produce an adjusted phase error. The adjustedphase error 2123 is stored in the delay element 2124 and outputted tothe first filter 2130 at the next clock transition. The delay element2124 is used to prevent the propagation delay of the adder 2122 fromconcatenating with the propagation delay of the adder 2132 in the firstfilter 2130.

[0354] The first filter 2130, termed “phase accumulator”, accumulatesthe phase error 2125 outputted by the phase detector 610 over a periodof time then outputs the accumulated result at the end of the period oftime. In the exemplary embodiment shown in FIG. 21, this period of timeis 16 symbol periods. The first filter 2130 is an “accumulate-and-dump”filter which includes the adder 2132, a delay element (i.e., register)2134, and a 16-units-of-time register 2136. The register 2126 outputs alowpass filtered phase error 2127 at the rate of one per period of theTRSAMP0 2104 clock, that is, one every 16 symbol periods. When theregister 2126 outputs the lowpass filtered phase error 2127, theregister 2134 is cleared and the accumulation of phase error 2125restarts. It is noted that, downstream from the register 2126, circuitsare clocked at one sixteenth of the symbol rate.

[0355] The filtered phase error 2137 is inputted to a multiplier 2140where it is multiplied by a factor different than 1 when it is desiredthat the bandwidth of the phase locked loop be different than its normalvalue (which is determined by the design of the filter). In theexemplary embodiment depicted in FIG. 21, filtered phase error 2137 ismultiplied by the value 2 outputted from a multiplexer 2142 when theselect signal 2106 indicates that the loop filter bandwidth must belarger than normal value. This occurs, for example, during startup ofthe gigabit transceiver. Similarly, although not shown in FIG. 21, whenit is desired that the loop filter bandwidth be narrower than normalvalue, the filtered phase error 2137 can be multiplied by a value lessthan 1.

[0356] The output 2144 of the multiplier 2140 is inputted to the secondfilter 2150 which is an integrator and to the adder 2160. The integrator2150 is an IIR filter having an adder 2152 and a register 2154,operating at one sixteenth of the symbol rate. The integrator 2150integrates the signal 2144 (which is essentially the filtered phaseerror 2137) to produce an integrated phase error 2156. The purpose ofthe phase locked loop is to generate a resulting phase for a samplingclock signal such that the phase error is equal to zero. The purpose ofthe integrator 2150 in the phase locked loop is to keep the phase errorof the resulting phase equal to zero even when there is static frequencyerror. Without the integrator 2150, the static frequency error wouldresult in a static phase error which would be attenuated but not madeexactly zero by the phase locked loop. With the integrator 2150 in thephase locked loop, any static phase error would be integrated to producea large growing input signal to the NCO 670, which would cause the phaselocked loop to correct the static phase error. The integrated phaseerror 2156 is scaled by a scale factor via a multiplier 2158. This scalefactor contributes to the determination of the gain of the integrator2150. The scaled result 2159 is added to the signal 2144 via an adder2160.

[0357] The output 2162 of the adder 2160 is inputted to the NCO 2170.The output 2162 is scaled by a scale factor, e.g., 2⁻⁵, via a multiplier2172. The resulting scaled signal is recursively filtered by an IIRfilter formed by an adder 2174 and a register 2176. The IIR filteroperates at one sixteenth of the symbol rate. The signal 2178, outputtedevery 16 symbol periods, is used as the phase control signal to one ofthe phase selectors 2010, 2020, 2030, 2040, 2050, 2060 (FIG. 20).

[0358] For the embodiment shown in FIG. 21, the gain parametersdiscussed above are as follows. K_(v), the gain of the NCO, is 2⁻¹¹ fornormal bandwidth mode, 2⁻¹⁰ for high bandwidth mode. K₁, the gain of theintegrator 2150, is equal to the product of the scaling of theintegrator register 2154 (2⁻⁸ in FIG. 21) and the ratio of the phaselocked loop sampling rate to the symbol rate (2⁻⁴ in FIG. 21). For theword lengths and scaling indicated in FIG. 21, K₁ is equal to 2⁻¹². Thegain K_(d) of the phase detector 2110 is computed by simulations and isequal to 2.2. These parameters are used to compute the theoreticaltransfer function of the phase locked loop (PLL) which is then comparedwith the PLL transfer function obtained by simulation. The match is nearperfect, confirming the validity of the design parameters.

[0359] One embodiment of the system 2100 of FIG. 21 further includes theexternal control signals PLLFRZ, PLLPVAL, PLLPRST, PLLFVAL, PLLFRST,PLLPRAMP, which are not shown explicitly in FIG. 21.

[0360] The control signal PLLFRZ, when applied, forces the phase errorto zero to point 1 of the first filter 2130, therefore causes freezingof updates of the frequency change and/or phase change, except for anyphase change caused by a non-zero value in the frequency register 2154of the integrator 2150.

[0361] The control signal PLLPVAL is a 3-bit signal provided by the PHYControl system. It is used to specify the reset value of the NCOregister 2176 of the NCO 2170, and is used in conjunction with thecontrol signal PLLPRST.

[0362] The control signal PLLPRST, when applied to the NCO register 2176in conjunction with the signal PLLPVAL, resets the 6 most significantbits of the NCO register 2176 to a value specified by 8 times PLLPVAL.The reset is performed by stepping up or down the 6 MSB field of the NCOregister 2176 such that the specified value is reached after a minimumnumber of steps. Details of the phase reset logic block used to resetthe value of the register 2176 of the NCO 2170 are shown in FIG. 22 andwill be discussed later.

[0363] PLLFVAL is a 3-bit signal provided by the PHY Control system. Itis to be interpreted as a 3-bit two's complement signed integer in therange [−4,3]. It is used to specify the reset value of the frequencyregister 2154 of the integrator 2150 and is used in conjunction with thecontrol signal PLLFRST.

[0364] The control signal PLLFRST, when applied to the frequencyregister 2154 of the integrator 2150 in conjunction with the signalPLLFVAL, resets the frequency register 2154 to the value 65536 timesPLLFVAL.

[0365] The control signal PLLPRAMP loads the fixed number −2048 into thefrequency register 2154 of the integrator 2150. This causes the phase ofa sampling clock signal (and receive clock RCLK) to ramp at the fixedrate of −2 ppm. This is used during startup at the master constituenttransceiver. PLLPRAMP overrides PLLFRST. In other words, if bothPLLPRAMP and PLLFRST are both applied, the value loaded into thefrequency register 2154 is −2048, regardless of the value that PLLFRSTtries to load.

[0366]FIG. 22 is a block diagram illustrating the phase reset logicblock 2200 to the NCO 2170. The control signal PLLPRST is applied to theAND gate 2202. The output of the AND gate 2202 is applied to theincrement/decrement enable input of the register 2176. The 3-bit valuePLLPVAL from the PHY Control System of the gigabit transceiver isshifted left by 3 bits to form a 6-bit value 2204.

[0367] The current output of the register 2176 of the NCO 2170 (FIG.21), which is the phase control signal inputted to the correspondingphase selector (FIG. 20), is subtracted from this shifted value ofPLLPVAL via an adder 2206. Module 2208 determines whether the output ofadder 2206 is non-zero. If it is non-zero, then module 2208 outputs a“1” to the AND gate 2202 to enable the enable input of register 2176. Ifit is zero, module 2206 outputs a zero to the AND gate 2208 to disablethe enable input of the register 2176. Module 2210 determines whetherthe output of adder 2206 is positive or negative. If it is positive,module 2210 outputs a count up indicator to the register 2176. If it isnegative, module 2210 outputs a count down indicator to register 2176.

[0368] The subtraction at adder 2206 finds the shortest path from thecurrent value of the NCO register 2176 to the shifted PPLVAL 2204. Forexample, suppose the current phase value of register 2176 is 20. If theshifted PPLVAL 2204 (which is the desired value) is 32, the differenceis 12, which is positive, therefore, the register 676 is incremented. Ifthe desired phase value is 56, the difference is 36 or “100100” which isinterpreted as −28, so the register 2176 will be decremented 28consecutive times. The phase steps occur at the rate of one every 16symbol periods. This single stepping is needed because of the way thephase selector operates. The phase selector can only increment ordecrement from its current setting.

[0369]FIG. 23 is a block diagram of an exemplary phase shifter logicblock used for the phase control of the receive clock signal RCLK. Thephase shifter logic block 2300 is needed when the signal RCLK Offset(FIG. 20) is used to adjust the phase of the receive clock signal RCLK.The signal RCLK Offset is a 6-bit signal provided by the PHY Controlsystem, and specifies the amount by which the phase of RCLK mustshifted. Even if the signal RCLK Offset indicates a large amount ofphase shift, this phase shift must be transferred to the input of thephase selector 2050 (FIG. 20) one step at a time due to the way thephase selector operates. The change of phase of RCLK must occur in thedirection indicated by a control signal STEPDIR generated by the PHYControl system.

[0370] The phase shifter logic block 2300 includes a comparator 2302, anoffset register 2304 and the adder 2042 (the same adder indicated inFIG. 20). The comparator 2302 compares the output 2306 of the offsetregister 804 with the signal RCLK Offset. If the two signals are equal,then the comparator 2302 outputs a “0” to the enable input of the offsetregister 2304 to disable the up/down counting of the offset register2304, thus keeping the output 2306 the same for the next time period. Ifthe two signals are not equal, the comparator 2302 outputs a “1” to theenable input of the offset register 2304 to enable the up/down counting,causing the output 2306 to be incremented or decremented at the nexttime period. The signal STEPDIR from the PHY Control system is inputtedto the up/down input of the offset register 2304 to control the countingdirection. The output 2306 from the offset register 2304 is added to thephase control signal 2009 produced by the NCO 2008 (FIG. 20) via theadder 2042 to generate the phase control signal 2049 (FIGS. 23 and 20)for the RCLK phase selector 2050 (FIG. 20).

[0371] The coupling of switching noise from the digital signal processorthat implements the transceiver functions to each of the A/D convertersis an important problem that needs to be addressed. Switching noiseoccurs when transistors switch states in accordance with transitions inthe clock signal (or signals) that controls their operation. Switchingnoise in the digital section of the transceiver can be coupled to theanalog section of the transceiver. Switching noise can cause severedegradation to the performance of an A/D converter if it occurs right ator near the instant the A/D converter is sampling the received signal.The present invention, in addition to providing a timing recovery methodand system, also provides a method and system for minimizing thedegradation of the performance of the A/D converters caused by switchingnoise.

[0372] The effect of switching noise on an A/D converter can be reducedif the switching noise is synchronous (with a phase delay) with thesampling clock of the A/D converter. If, in addition, it is possible toadjust the phase of the sampling clock of the A/D converter with respectto the phase of the switching noise, then the phase of the samplingclock of the A/D converter can be optimized for minimum noise. It isnoted that, for a local gigabit transceiver, the sampling clock signalsACLK0, ACLK1, ACLK2, ACLK3 are synchronous to each other (i.e., havingthe same frequency) because they are synchronous to the 4 transmittersof the remote transceiver and these 4 remote transmitters are clocked bya same transmit clock signal TCLK. It is also important to note that thelocal receive clock signal RCLK is synchronous to the local samplingclock signals ACLK0, ACLK1, ACLK2, ACLK3.

[0373] Referring to FIGS. 2 and 5, the four A/D converters 216 of thefour constituent transceivers are sampled with the sampling clocksignals ACLK0, ACLK1, ACLK2, ACLK3. Each of the phases of these samplingclock signals is determined by the subsystem 600 (FIG. 21) of the timingrecovery system 222 in response to the phase of the correspondingreceived signal, which depends on the remote transmitter and the linecharacteristics. Thus, the phases of the sampling clock signals changefrom line to line, and are not under the control of the system designer.

[0374] However, the relative phase of the receive clock signal RCLK withrespect to the sampling clock signals ACLK0, ACLK1, ACLK2, ACLK3 can becontrolled by adjusting the signal RCLK Offset (FIG. 20). The signalRCLK Offset can be used to select the RCLK phase that would cause theleast noise coupling to the A/D converters 216 of FIG. 2. The underlyingprinciple is the following. Referring to FIG. 2 and the boundaries ofthe clock domain, the entire digital signal processing, control andinterface functions of the receiver operate in accordance withtransitions in the receive clock signal RCLK. In other words, most ofthe digital logic circuits switch states on a transition of RCLK (morespecifically, on a rising edge of RCLK). Only a small portion of thetransceiver operates in accordance with transitions in the transmitclock signal TCLK. Therefore, most of the switching noise is synchronouswith the receive clock signal RCLK. Since the receive clock signal RCLKis synchronous with the sampling clock signals ACLK0, ACLK1, ACLK2,ACLK3, it follows that most of the switching noise is synchronous withthe sampling clock signals ACLK0, ACLK1, ACLK2, ACLK3. Therefore, if thephase of the receive clock signal RCLK is adjusted such that atransition in the signal RCLK occurs as far as possible in time fromeach of the sampling clock signals ACLK0, ACLK1, ACLK2, ACLK3, then theswitching noise coupling to the A/D converters will be minimized.

[0375] The process for adjusting the phase of the receive clock signalRCLK can be summarized as follows. The process performs an exhaustivesearch over all the RCLK phases that, by design, can possibly exist inone symbol period. For each phase, the process computes the sum of themean squared errors (MSEs) of the 4 pairs (i.e., the 4 constituenttransceivers). At the end of the search, the process selects the RCLKphase that minimizes the sum of the MSEs of the four pairs. Thefollowing is a description of one embodiment of the RCLK phaseadjustment process, where there are 64 possible RCLK phases.

[0376]FIG. 24 is a flowchart illustrating the process 2400 for adjustingthe phase of the receive clock signal RCLK. Upon Start (block 2402),process 2400 initializes all the state variables (which includecounters, registers), sets Offset to −32 (block 2404), sets Min_MSEequal to the MSE of the gigabit transceiver before any RCLK phasechange, and sets BestOffset equal to zero. The MSE of the gigabittransceiver is the sum of the mean squared errors (MSEs) of the 4constituent transceivers. The MSE of a constituent transceiver is themean squared error of the corresponding 1D component of the 4D slicererror 42 (FIG. 2), and is outputted by a MSE computation block 2700(FIG. 27) for every frame. Each frame is equal to 1024 symbol periods.This initialization is done within a duration of 1 frame. Process 2400then waits for the effect of the RCLK phase change on the system tosettle (block 2406). The duration of this waiting is 5 frames. Process2400 then computes MSE (by summing the MSEs of all four constituenttransceivers outputted by the corresponding MSE computation block 2700of FIG. 27) which corresponds to the current setting of RCLK Offset(block 908). The duration of block 2408 is one frame. In block 2410,process 2400 compares the new MSE with Min MSE. If the new MSE isstrictly less than Min _MSE, then Min−MSE is set to the value of the newMSE and BestOffset is set to the value of Offset. In block 2412, processchecks whether Offset is equal to 31, i.e., whether all possible 64phase offsets have been searched. If Offset is not equal to 31, thenprocess 2400 increments Offset by 1 (block 2414) then continues thesearch for the best RCLK Offset by going back to block 2406. If Offsetis equal to 31, tat is, if process 2400 has searched all possible 64phase offsets, then process 2400 sets Offset equal to the value ofBestOffset (block 2416) then terminates (block 2418). The duration ofeach of blocks 2414 and 2416 is 1 frame.

[0377] After adjustment of the receive clock RCLK phase, smalladjustments can be made to the phases of the sampling clocks ACLK1,ACLK2, ACLK3 to further reduce the coupling of switching noise to theA/D converters. Since the timing recovery system 222 of FIG. 20 withoutthe ACLK0-3 Offsets, through the phase locked loop principle, alreadysets the sampling clocks at the optimal sampling positions with respectto the pulse shape of incoming signals from the remote transceivers, thesmall phase adjustments made to the sampling clocks could cause someloss of performance of the A/D converters. However, the net result isstill better than performing no phase adjustment of the sampling clocksand allowing the A/D converters to sample the incoming signals at anoisy instant where the transistors in the digital section are switchingstates. In the embodiment depicted in FIG. 20, phase adjustment is notmade to the sampling clock ACLK0 because, by design of the structure ofthe embodiment, the phase difference between ACLK0 and RCLK is equal toRCLK Offset. Thus, in this embodiment, any adjustment to the phase ofACLK0 will also move RCLK away from the optimal position determined byprocess 2400 above by the same amount of phase adjustment.

[0378]FIGS. 25A, 25B, 25C illustrate three examples of distribution ofthe transitions of clock signals within a symbol period to furtherclarify the concept of phase adjustment of the clock signals. It isnoted that, in these examples, the four sampling clock signals ACLK0-3are shown as occurring in their consecutive order within a symbol periodfor illustrative purpose only. It is understood that the sampling clocksignals ACLK0-3 can occur in any order.

[0379]FIG. 25A is a first example of clock distribution where thetransitions of the four sampling clock signals ACLK0-3 are evenlydistributed within the symbol period of 8 nanoseconds (ns). Thus, eachACLK clock transition is 2 ns apart from an adjacent transition ofanother ACLK clock. Therefore, for this clock distribution example, atransition of the receive clock RCLK can only be placed at most 1 nsaway from an adjacent ACLK transition. This “distance” (phase delay) maynot be enough to reduce the coupling of switching noise to the two A/Dconverters associated with the two adjacent sampling clock signals(ACLK3 and ACLK0, in the example). In this case, it may be desirable toslightly adjust the phase of the two adjacent sampling clock signals tomove their respective transitions further away from a RCLK transition,as illustrated by their new transition occurrences within a symbolperiod in FIG. 25A.

[0380]FIG. 25B is a second example of clock distribution where thetransitions of the four sampling clock signals ACLK0-3 are distributedwithin the symbol period of 8 nanoseconds (ns) such that each ACLK clocktransition is 1 ns apart from an adjacent transition of another ACLKclock. For this clock distribution example, a transition of the receiveclock RCLK can be positioned midway between the last ACLK transition ofone symbol period (ACLK3 in FIG. 25B) and the first ACLK transition ofthe next symbol period (ACLK0 in FIG. 25B) so that the RCLK transitionis 2.5 ns from an adjacent ACLK transition. This “distance” (phasedelay) may be enough to reduce the coupling of switching noise to thetwo A/D converters associated with the two adjacent sampling clocksignals (ACLK3 and ACLK0, in the example). In this case, phaseadjustment of the two adjacent sampling clock signals to move theirrespective transitions further away from a RCLK transition may not beneeded.

[0381]FIG. 25C is a third example of clock distribution where thetransitions of the four sampling clock signals ACLK0-3 occur at the sameinstant within the symbol period of 8 nanoseconds (ns). In this clockdistribution example, a transition of the receive clock RCLK can bepositioned at the maximum possible distance of 4 ns from an adjacentACLK transition. This is the best clock distribution that allows maximumreduction of coupling of switching noise to the four A/D convertersassociated with the sampling clock signals. In this case, there is noneed for phase adjustment of the sampling clock signals.

[0382] For the embodiment shown in FIG. 20 of the timing recovery system222 (FIG. 2), the following phase adjustment process is applied to thethree sampling clock signals ACLK1, ACLK2, ACLK3. It is understood that,in a different embodiment of the timing recovery system 222 (FIG. 2)where the receive clock signal RCLK is not tied to one of the samplingclock signals ACLK0-3, the following phase adjustment process can beapplied to all of the sampling clock signals.

[0383] The process for adjusting the phase of a sampling clock signalACLKx (“x” in ACLKx denotes one of 0, 1, 2, 3) can be summarized asfollows. The process performs a search over a small range of phasesaround the initial ACLKx phase. For each phase, the process logs themean squared error MSE of the associated constituent transceivers. Atthe end of the search, the process selects the ACLKx phase thatminimizes the MSE of the associated constituent transceiver.

[0384] Whenever the phase of a sampling clock signal ACLKx changes, thecoefficients of the echo canceller 232 and of the NEXT cancellers 230change. Thus, to avoid degradation of performance, the phase steps ofthe sampling clocks should be small so that the change they induce onthe coefficients is also small. When the phase adjustment requiresmultiple consecutive phase steps, the convergence of the coefficients ofthe echo canceller 232 and of the NEXT cancellers 230 should be fast inorder to avoid a buildup of coefficient mismatch.

[0385]FIG. 26 is a flowchart illustrating an embodiment of the processfor adjusting the phase of a sampling clock signal ACLKx associated withone of the constituent transceivers, where the search is over a range of16 phases around the initial ACLKx phase. For each of the constituenttransceivers, process 2600 of FIG. 26 is run independently of andconcurrently with the other constituent transceivers. Upon Start (block2602), process 2600 initializes all the state variables (which includecounters, registers), sets Offset to −8 (block 2604), sets Min_MSE equalto the MSE of the associated constituent transceiver before any RCLKphase change, and sets BestOffset equal to zero. The MSE of theassociated constituent transceiver is the mean squared error of thecorresponding 1D component of the 4D slicer error 42 (FIG. 2). Thisinitialization is done within a duration of 1 frame. Process 2600 thenwaits for the effect of the ACLK phase change on the system to settle(block 2606). The duration of this waiting is 32 frames. (block 2608).The duration of block 2608 is one frame. In block 2610, process 2600compares the new MSE (outputted by the corresponding MSE computationblock 2700 of FIG. 27) which corresponds to the current setting of ACLKxOffset with Min _MSE. If the new MSE is strictly less than Min MSE, thenMin−MSE is set to the value of the new MSE and BestOffset is set to thevalue of Offset. In block 2612, process 2600 checks whether Offset isequal to 7, i.e., whether all 16 phase offsets in the range have beensearched. If Offset is not equal to 7, then process 2700 incrementsOffset by 1 (block 2614) then continues the search for the best ACLKxOffset by looping back to block 2606. If Offset is equal to 7, that is,if process 2600 has searched all the 16 phase offsets in the range, thenprocess 2600 sets Offset equal to the value of BestOffset (block 2616)then terminates (block 2618). The duration of each of blocks 2614 and2616 is 1 frame.

[0386]FIG. 27 is a block diagram of an exemplary implementation of theMSE computation block used for computing the mean squared error of aconstituent transceiver. In one embodiment of the gigabit transceiver,there are four MSE computation blocks, one for each of the fourconstituent transceivers. The four MSE computation blocks are runindependently and concurrently for the four constituent transceivers.The MSE computation block 2700 includes a squaring module 2702 and aninfinite impulse response (IIR) filter 2704. The IIR filter 2704includes an adder 2706, a feedback delay element 2708 and a forwarddelay element 2710. The squaring module 2702 receives the corresponding1D component of the 4D slicer error 42 (FIG. 2), which is denoted as 42Afor simplicity, and out puts the squared error value to the filter 2704.The filter 2704 accumulates the squared error values by adding via theadder 2706 the current squared error value to the previous squared errorvalue stored in the feedback delay element 2708. The accumulated valueis stored in the forward register 2710. In the exemplary embodimentshown in FIG. 27, the squared error values are accumulated for 1024symbol periods (which is one frame of the PHY Control system). Since theaccumulation period is sufficiently long, the accumulated valuepractically corresponds to the mean squared error. At the end of theaccumulation period, the clock signal 2720 from the PHY Control systemclears the contents of the feedback delay element, and clocks theforward delay element 2710 so that the forward delay element 2710outputs the accumulated value MSE and resets to zero.

[0387] While certain exemplary embodiments have been described in detailand shown in the accompanying drawings, it is to be understood that suchembodiments are merely illustrative of and not restrictive on the broadinvention. It will thus be recognized that various modifications may bemade to the illustrated and other embodiments of the invention describedabove, without departing from the broad inventive scope thereof. It willbe understood, therefore, that the invention is not limited to theparticular embodiments or arrangements disclosed, but is rather intendedto cover any changes, adaptations or modifications which are within thescope and spirit of the invention as defined by the appended claims.

What is claimed is:
 1. An integrated circuit communication deviceconfigured for operation over a multi-pair transmission channel, thecommunication device comprising: measurement circuitry configured tomeasure a performance degradation characteristic resulting fromdisabling each member of a set of sub-pluralities of a plurality ofcircuit elements; disabling circuitry configured to adaptively disableone or more of the sub-pluralities of the circuit elements until theperformance degradation characteristic reaches a threshold level; and adecision feedback sequence estimation (DFSE) circuit, the DFSE decodingan input sample into a final decision corresponding to a codeword of atrellis code having N states, the DFSE including; a decoder circuit fordecoding a set of signal samples to generate tentative decisions and thefinal decision; and a single state decision feedback equalizer.
 2. Theintegrated circuit communication device according to claim 1, thedecision feedback equalizer coupled to the decoder circuit for receivingthe tentative decisions, the single state decision feedback equalizerincluding: a set of low-ordered coefficients; and a set of high-orderedcoefficients generating a tail value based on the tentative decisionsand the input sample.
 3. The integrated circuit communication deviceaccording to claim 2, further comprising a state multiplication circuit,the state multiplication circuit expanding a single state representationof a signal received from the single state decision feedback equalizerinto an N state representation suitable for decoding by the DFSE.
 4. Theintegrated circuit communication device according to claim 3, the statemultiplication circuit comprising a multiple decision feedback equalizercoupled to the decision-feedback equalizer and generating an N staterepresentation of signal samples in response to the tail value and theset of low-ordered coefficients received from the decision feedbackequalizer.
 5. The integrated circuit communication device according toclaim 1, the DFSE circuit further comprising: a Viterbi decoder forreceiving the set of signal samples, the Viterbi decoder computing pathmetrics for each of the N states of the trellis code and outputingdecisions based on the path metrics; and a path memory module coupled tothe Viterbi decoder for receiving the decisions, the path memory modulehaving a number of depth levels corresponding to consecutive timeinstants, each of the depth levels including N registers for storingdecisions corresponding to the N states, each of selected depth levelsincluding a multiplexer for selecting a best decision from correspondingN registers, the best decision at the last depth level being the finaldecision, the best decisions at other selected depth levels being thetentative decisions.
 6. The integrated circuit communication deviceaccording to claim 4, the multiple decision feedback equalizercomprising: a memory; a set of symbolic levels contained within thememory; and a convolution engine coupled to combine the set of low ordercoefficients with each member of the set of symbolic levels.
 7. Anintegrated circuit communication device configured for operation over amulti-pair transmission channel, the communication device comprising:measurement circuitry configured to measure a performance degradationcharacteristic resulting from disabling each member of a set ofsub-pluralities of a plurality of circuit elements; disabling circuitryconfigured to adaptively disable one or more of the sub-pluralities ofthe circuit elements until the performance degradation characteristicreaches a threshold level; and a single state decision feedbackequalizer;
 8. The integrated circuit communication device according toclaim 7, the single state decision feedback equalizer having a set ofordered coefficients, the decision feedback equalizer defining acoefficient related tail value and a low order subset of coefficientvalues
 9. The integrated circuit communication device according to claim8, wherein the single state decision feedback equalizer has a widthdimension D, wherein the width dimension D corresponds to the number ofpairs defining the multi-pair transmission channel.
 10. The integratedcircuit communication device according to claim 9, further comprising astate multiplication circuit, the state multiplication circuit expandinga single state representation output signal received from the singlestate decision feedback equalizer into an N state representation signalsuitable for decoding by the DFSE.
 11. The integrated circuitcommunication device according to claim 10, the state multiplicationcircuit comprising: a convolution engine coupled to combine the loworder subset of coefficient values with each member of a set of symboliclevels to define a first sample signal set; and a summing circuitcoupled to combine the tail value with each member of the first samplesignal set to define an N state representational set of signal samples.12. The integrated circuit communication device according to claim 7,further comprising: a control module controlling activation anddeactivation of at least a portion of the sub-pluralities of the circuitelements according to a criterion, the criterion being based on at leastone of an information error metric, a power metric, a specified errorand a specified power; and a computing module coupled to the controlmodule, the computing module computing at least one of the informationerror metric and the power metric.
 13. The integrated circuitcommunication device according to claim 10, wherein the criterion is thefollowing: activate if the information error metric is greater than thespecified error; and deactivate if the information error metric issmaller than the specified error.
 14. The integrated circuitcommunication device according to claim 13, wherein the criterion is thefollowing: activate if the information error metric is greater than thespecified error and the power metric is smaller than the specifiedpower; and deactivate if the information error metric is smaller thanthe specified error or the power metric is greater than the specifiedpower.
 15. The integrated circuit communication device according toclaim 14, wherein the information error metric is related to a bit errorrate of the communication system.
 16. An integrated circuitcommunication device configured for operation over a multi-pairtransmission channel, the communication device comprising: a singlestate decision feedback equalizer having a set of ordered coefficients,the decision feedback equalizer defining a coefficient related tailvalue and a low order subset of coefficient values; a statemultiplication circuit, the state multiplication circuit expanding asingle state representation output signal received from the single statedecision feedback equalizer into an N state representation signalsuitable for decoding by the DFSE; a first ISI compensation circuitreceiving an input signal and outputting a second signal substantiallycompensated for a first ISI component; and a second ISI compensationcircuit, the second ISI compensation circuit receiving the second signaland generating a third signal, the third signal being substantiallycompensated for a second ISI component.
 17. The integrated circuitcommunication device according to claim 16, the first ISI compensationdevice comprising an equalizer circuit, including: an ISI compensationfilter having a substantially inverse impulse response to the impulseresponse of a pulse shaping filter of a remote transmitter; and anadaptive gain stage.
 18. The integrated circuit communication deviceaccording to claim 16, the second ISI compensation device comprising adecision feedback sequence estimation circuit
 19. The integrated circuitcommunication device according to claim 18, the decision feedbacksequence estimation circuit comprising: a decoder circuit receiving anddecoding at least one ISI compensated signal sample, and generatingtentative decisions and a final decision; and a decision feedbackequalizer coupled in feedback fashion to the decoder block, the decisionfeedback equalizer including a set of low-ordered coefficients and a setof high-ordered coefficients, the decision feedback equalizer generatinga first portion of ISI compensation for the second ISI component basedon the tentative decisions and the high-ordered coefficients.
 20. Theintegrated circuit communication device according to claim 19, whereinthe decision feedback sequence estimation circuit further comprises aconvolution engine coupled to the decision feedback equalizer to receivevalues of the low-ordered coefficients, the convolution engine computinga set of pre-computed values representing a set of potential second ISIcompensation portions for the second ISI component.
 21. The integratedcircuit communication device according to claim 20, wherein a seconddigital signal is combined with the first portion of ISI compensation toproduce a third digital signal partially compensated for the second ISIcomponent.
 22. The integrated circuit communication device according toclaim 21, wherein the decision feedback sequence estimation circitfurther comprises a multiple decision feedback equalizer coupled to thedecision feedback equalizer and the convolution engine, the multipledecision feedback equalizer combining the set of pre-computed valueswith the third digital signal to produce a set of potential digitalsignals, one of the potential digital signals being substantiallycompensated for the second ISI component.
 23. The integrated circuitcommunication device according to claim 22, wherein the first ISIcomponent represents ISI introduced by a remote transmission device, andwherein the second ISI component represents ISI introduced bytransmission channel characteristics.
 24. An integrated circuitcommunication device configured for operation over a multi-pairtransmission channel, the communication device comprising: measurementcircuitry configured to measure a performance degradation characteristicresulting from disabling each member of a set of sub-pluralities of aplurality of circuit elements; disabling circuitry configured toadaptively disable one or more of the sub-pluralities of the circuitelements until the performance degradation characteristic reaches athreshold level; and a first ISI compensation circuit configured tocompensate for a transmitter induced ISI component; and a second ISIcompensation circuit configured to compensate for a transmission channelinduced ISI component.
 25. The integrated circuit communication deviceaccording to claim 24, the first ISI compensation device comprising anequalizer circuit, including: an ISI compensation filter having asubstantially inverse impulse response to the impulse response of apulse shaping filter of a remote transmitter; and an adaptive gainstage.
 26. The integrated circuit communication device according toclaim 24, the second ISI compensation device comprising a decisionfeedback sequence estimation circuit
 27. The integrated circuitcommunication device according to claim 26, the decision feedbacksequence estimation circuit comprising: a decoder circuit receiving anddecoding at least one ISI compensated signal sample, and generatingtentative decisions and a final decision; and a decision feedbackequalizer coupled in feedback fashion to the decoder block, the decisionfeedback equalizer including a set of low-ordered coefficients and a setof high-ordered coefficients, the decision feedback equalizer generatinga first portion of ISI compensation for the second ISI component basedon the tentative decisions and the high-ordered coefficients.
 28. Theintegrated circuit communication device according to claim 27, whereinthe decision feedback sequence estimation circuit further comprises aconvolution engine coupled to the decision feedback equalizer to receivevalues of the low-ordered coefficients, the convolution engine computinga set of pre-computed values representing a set of potential second ISIcompensation portions for the second ISI component.
 29. The integratedcircuit communication device according to claim 28, wherein a seconddigital signal is combined with the first portion of ISI compensation toproduce a third digital signal partially compensated for the second ISIcomponent.
 30. The integrated circuit communication device according toclaim 29, wherein the decision feedback sequence estimation circitfurther comprises a multiple decision feedback equalizer coupled to thedecision feedback equalizer and the convolution engine, the multipledecision feedback equalizer combining the set of pre-computed valueswith the third digital signal to produce a set of potential digitalsignals, one of the potential digital signals being substantiallycompensated for the second ISI component.
 31. The integrated circuitcommunication device according to claim 30, wherein the first ISIcomponent represents ISI introduced by a remote transmission device, andwherein the second ISI component represents ISI introduced bytransmission channel characteristics.
 32. The integrated circuitcommunication device according to claim 24, further comprising: acontrol module controlling activation and deactivation of at least aportion of the sub-pluralities of the circuit elements according to acriterion, the criterion being based on at least one of an informationerror metric, a power metric, a specified error and a specified power;and a computing module coupled to the control module, the computingmodule computing at least one of the information error metric and thepower metric.
 33. The integrated circuit communication device accordingto claim 32, wherein the criterion is the following: activate if theinformation error metric is greater than the specified error; anddeactivate if the information error metric is smaller than the specifiederror.
 34. The integrated circuit communication device according toclaim 33, wherein the criterion is the following: activate if theinformation error metric is greater than the specified error and thepower metric is smaller than the specified power; and deactivate if theinformation error metric is smaller than the specified error or thepower metric is greater than the specified power.
 35. The integratedcircuit communication device according to claim 34, wherein theinformation error metric is related to a bit error rate of thecommunication system.
 36. An integrated circuit communication deviceconfigured for operation over a multi-pair transmission channel, thecommunication device comprising: measurement circuitry configured tomeasure a performance degradation characteristic resulting fromdisabling each member of a set of sub-pluralities of a plurality ofcircuit elements; disabling circuitry configured to adaptively disableone or more of the sub-pluralities of the circuit elements until theperformance degradation characteristic reaches a threshold level; and adecoder system for computing the distance of a received symbolic wordfrom a codeword.
 37. The integrated circuit communication deviceaccording to claim 36, further comprising: a control module controllingactivation and deactivation of at least a portion of the sub-pluralitiesof the circuit elements according to a criterion, the criterion beingbased on at least one of an information error metric, a power metric, aspecified error and a specified power; and a computing module coupled tothe control module, the computing module computing at least one of theinformation error metric and the power metric.
 38. The integratedcircuit communication device according to claim 37, wherein thecriterion is the following: activate if the information error metric isgreater than the specified error; and deactivate if the informationerror metric is smaller than the specified error.
 39. The integratedcircuit communication device according to claim 38, wherein thecriterion is the following: activate if the information error metric isgreater than the specified error and the power metric is smaller thanthe specified power; and deactivate if the information error metric issmaller than the specified error or the power metric is greater than thespecified power.
 40. The integrated circuit communication deviceaccording to claim 39, wherein the information error metric is relatedto a bit error rate of the communication system.
 41. The integratedcircuit communication device according to claim 36, configured toreceive information encoded in accordance with a multi-level symbolicscheme and over a multi-dimensional transmission channel, the decodersystem comprising: an input, coupled to receive an input signal; a firstslicer, coupled to detect the input signal with respect to a first oneof two disjoint one-dimensional symbol-subsets; and a second slicer,coupled to detect the input signal with respect to a second one of thetwo disjoint one-dimensional symbol-subsets; wherein the first sliceroutputs a first decision term and a first error term with respect to thefirst one of the two disjoint one-dimensional symbol-subsets, the secondslicer outputting a second decision term and a second error term withrespect to the second one of the two disjoint one-dimensionalsymbol-subsets; and wherein each of the first and second error terms isexpressed by a digital representation having substantially fewer bitsthan the input signal.
 42. The symbol decoder according to claim 41,wherein each of the first and second error terms represents a distancemetric between the input signal and a symbol in the respective one ofthe two disjoint one-dimensional symbol-subsets.
 43. The integratedcircuit communication device according to claim 36, configured toreceive information encoded in accordance with a multi-level symbolicscheme and over a multi-dimensional transmission channel, the decodersystem comprising: an input to receive an input signal; a first slicercoupled to the input, the first slicer detecting the input signal withrespect to a first one of two disjoint one-dimensional symbol-subsets; asecond slicer coupled to the input, the second slicer detecting theinput signal with respect to a second one of the two disjointone-dimensional symbol-subsets; and a third slicer coupled to detect theinput signal with respect to a union set of the two disjointone-dimensional symbol-subsets.
 44. The integrated circuit communicationdevice according to claim 43, wherein the first slicer outputs a firstdecision with respect to the first one of the two disjointone-dimensional symbol-subsets, the second slicer ouputting a seconddecision with respect to the second one of the two disjointone-dimensional symbol-subsets, and wherein the third slicer outputs athird decision with respect to the union set of the two disjointone-dimensional symbol-subsets.
 45. The integrated circuit communicationdevice according to claim 44, further comprising: a first combinationlogic block configured to combine the first decision with the thirddecision, the first combination logic block defining a first error term;and a second combination logic block configured to combine the seconddecision with the third decision, the second combination logic blockdefining a second error term.
 46. The integrated circuit communicationdevice according to claim 45, further comprising: a first square errorgeneration block configured to operate on the first error term so as todefine a square error representation thereof; and a second square errorgeneration block configured to operate on the second error term so as todefine a square error representation thereof.
 47. The integrated circuitcommunication device according to claim 46, wherein each of the errorterms is expressed as a digital representation having one bit.
 48. Anintegrated circuit communication device configured for operation over amulti-pair transmission channel, the communication device comprising: afirst ISI compensation circuit configured to compensate for atransmitter induced ISI component; a second ISI compensation circuitconfigured to compensate for a transmission channel induced ISIcomponent; and a decoder system for computing the distance of a receivedsymbolic word from a codeword.
 49. The integrated circuit communicationdevice according to claim 48, wherein the first ISI compensation circuitcomprises: an inverse partial response filter having an impulse responsesubstantially an inverse of an impulse response of a pulse shapingfilter of a remote transmitter, so as to substantially compensate aninput digital signal for a first ISI component.
 50. The integratedcircuit communication device according to claim 49, wherein the inversepartial response filter is implemented with a characteristic feedbackgain factor K.
 51. The integrated circuit communication device accordingto claim 50, wherein the inverse partial response filter operates inaccordance with a non-zero value of the characteristic feedback gainfactor K during communication initialization and wherein the value ofthe feedback gain factor K is ramped down to zero after a pre-definedinterval.
 52. The integrated circuit communication device according toclaim 51, wherein the second ISI compensation circuit comprises: aViterbi decoder configured to decode a digital signal and generatetentative decisions; and feedback equalizer circuitry coupled to theViterbi decoder, the feedback equalizer circuitry receiving thetentative decisions and combining the tentative decisions with a set ofhigh-ordered coefficients to generate a first value.
 53. The integratedcircuit communication device according to claim 52, wherein the secondISI compensation circuit further comprises: summing circuitry combiningthe first value with a second digital signal, the summing circuitryoutputting an intermediate signal; and a multiple decision feedbackequalizer receiving the intermediate signal and combining theintermediate signal with a set of pre-computed values generated bycombining values of a set of low-ordered coefficients with a set ofvalues representing levels of a multi-level symbolic alphabet to producea set of potential digital signals, one of the potential digital signalsbeing substantially ISI compensated, the multiple decision feedbackequalizer outputting said one of the potential digital signals to theViterbi decoder.
 54. The integrated circuit communication deviceaccording to claim 53, wherein the characteristic feedback gain factor Kis ramped to zero after convergence of the decision feedback equalizer.55. The integrated circuit communication device according to claim 48,the codeword being a concatenation of L symbols selected from twodisjoint symbol-subsets X and Y, the codeword being included in one of aplurality of code-subsets, the received word being represented by Linputs, each of the L inputs uniquely corresponding to one of Ldimensions, the decoder system comprising: a set of slicers forproducing a set of one-dimensional errors from the L inputs, each of theone-dimensional errors representing a distance metric between one of theL-inputs and a symbol in one of the two disjoint symbol-subsets; and acombining module for combining the one-dimensional errors to produce aset of L-dimensional errors such that each of the L-dimensional errorsis a distance of the received word from a nearest codeword in one of thecode-subsets.
 56. The integrated circuit communication device accordingto claim 55, wherein each of the one-dimensional errors is representedby substantially fewer bits than each of the L inputs.
 57. Theintegrated circuit communication device according to claim 55, whereinthe slicers slice the L inputs with respect to each of the two disjointsymbol-subsets X and Y to produce a set of X-based errors, a set ofY-based errors and corresponding sets of X-based and Y-based decisions,the sets of X-based and Y-based errors forming the set ofone-dimensional errors, the sets of X-based and Y-based decisionsforming the set of one-dimensional decisions, each of the X-based andY-based decisions being a symbol in a corresponding symbol-subsetclosest in distance to one of the L inputs, each of the one-dimensionalerrors representing a distance metric between a correspondingone-dimensional decision and one of the L inputs.
 58. The integratedcircuit communication device according to claim 55, wherein the set ofslicers comprises: first slicers for slicing each of the L inputs withrespect to each of the two disjoint symbol-subsets X and Y to produce aset of X-based decisions and a set of Y-based decisions, the sets ofX-based and Y-based decisions forming the set of one-dimensionaldecisions, each of the X-based and Y-based decisions being a symbol in acorresponding symbol-subset closest in distance to one of the L inputs;second slicers for slicing each of the L inputs with respect to asymbol-set comprising all symbols of the two disjoint symbol-subsets toproduce a set of hard decisions; and error-computing modules forcombining each of the sets of X-based and Y-based decisions with the setof hard decisions to produce the set of one-dimensional errors, each ofthe one-dimensional errors representing a distance metric between thecorresponding one-dimensional decision and one of the L inputs.
 59. Theintegrated circuit communication device according to claim 55, whereinthe combining module comprises: a first set of adders for combining theone-dimensional errors to produce two-dimensional errors; a second setof adders for combining the two-dimensional errors to produceintermediate L-dimensional errors, the intermediate L-dimensional errorsbeing arranged into pairs of errors such that the pairs of errorscorrespond one-to-one to the code-subsets; and a minimum-select modulefor determining a minimum for each of the pairs of errors, the minimabeing the L-dimensional errors.
 60. An integrated circuit communicationdevice configured for operation over a multi-pair transmission channel,the communication device comprising: a decision feedback sequenceestimation (DFSE) circuit, for decoding an input sample into a finaldecision corresponding to a codeword of a trellis code having N states,the DFSE including a single state decision feedback equalizer; and adecoder system for computing the distance of a received symbolic wordfrom a codeword.
 61. The integrated circuit communication deviceaccording to claim 60, the single state decision feedback equalizerhaving a set of ordered coefficients, the decision feedback equalizerdefining a coefficient related tail value and a low order subset ofcoefficient values
 62. The integrated circuit communication deviceaccording to claim 61, wherein the single state decision feedbackequalizer has a width dimension D, wherein the width dimension Dcorresponds to the number of pairs defining the multi-pair transmissionchannel.
 63. The integrated circuit communication device according toclaim 62, further comprising a state multiplication circuit, the statemultiplication circuit expanding a single state representation outputsignal received from the single state decision feedback equalizer intoan N state representation signal suitable for decoding by the DFSE. 64.The integrated circuit communication device according to claim 60,configured to receive information encoded in accordance with amulti-level symbolic scheme and over a multi-dimensional transmissionchannel, the decoder system comprising: an input, coupled to receive aninput signal; a first slicer, coupled to detect the input signal withrespect to a first one of two disjoint one-dimensional symbol-subsets;and a second slicer, coupled to detect the input signal with respect toa second one of the two disjoint one-dimensional symbol-subsets; whereinthe first slicer outputs a first decision term and a first error termwith respect to the first one of the two disjoint one-dimensionalsymbol-subsets, the second slicer outputting a second decision term anda second error term with respect to the second one of the two disjointone-dimensional symbol-subsets; and wherein each of the first and seconderror terms is expressed by a digital representation havingsubstantially fewer bits than the input signal.
 65. The symbol decoderaccording to claim 64, wherein each of the first and second error termsrepresents a distance metric between the input signal and a symbol inthe respective one of the two disjoint one-dimensional symbol-subsets.66. The integrated circuit communication device according to claim 60,configured to receive information encoded in accordance with amulti-level symbolic scheme and over a multi-dimensional transmissionchannel, the decoder system comprising: an input to receive an inputsignal; a first slicer coupled to the input, the first-slicer detectingthe input signal with respect to a first one of two disjointone-dimensional symbol-subsets; a second slicer coupled to the input,the second slicer detecting the input signal with respect to a secondone of the two disjoint one-dimensional symbol-subsets; and a thirdslicer coupled to detect the input signal with respect to a union set ofthe two disjoint one-dimensional symbol-subsets.
 67. The integratedcircuit communication device according to claim 66, wherein the firstslicer outputs a first decision with respect to the first one of the twodisjoint one-dimensional symbol-subsets, the second slicer ouputting asecond decision with respect to the second one of the two disjointone-dimensional symbol-subsets, and wherein the third slicer outputs athird decision with respect to the union set of the two disjointone-dimensional symbol-subsets.
 68. The integrated circuit communicationdevice according to claim 67, further comprising: a first combinationlogic block configured to combine the first decision with the thirddecision, the first combination logic block defining a first error term;and a second combination logic block configured to combine the seconddecision with the third decision, the second combination logic blockdefining a second error term.
 69. The integrated circuit communicationdevice according to claim 68, further comprising: a first square errorgeneration block configured to operate on the first error term so as todefine a square error representation thereof; and a second square errorgeneration block configured to operate on the second error term so as todefine a square error representation thereof.
 70. The integrated circuitcommunication device according to claim 69, wherein each of the errorterms is expressed as a digital representation having one bit.
 71. Anintegrated circuit communication device configured for operation over amulti-pair transmission channel, the communication device comprising: adecision feedback sequence estimation (DFSE) circuit, for decoding aninput sample into a final decision corresponding to a codeword of atrellis code having N states, the DFSE including a single state decisionfeedback equalizer; a first ISI compensation circuit configured tocompensate for a transmitter induced ISI component; a second ISIcompensation circuit configured to compensate for a transmission channelinduced ISI component; and adaptive circuitry for reducing powerconsumption of a filter, the filter having an initial set of activecoefficients, an input and an output, the active coefficients beingordered, a lowest ordered active coefficient of the initial set beingproximal to the input, each of the active coefficients having a stablevalue.
 72. The integrated circuit communication device according toclaim 71, further comprising: a threshold module generating a threshold;a comparing module coupled to the threshold module, the comparing modulecomparing an active coefficient with the threshold; and a decisionmodule coupled to the comparing module, the decision module deactivatingthe active coefficient according to a criterion.
 73. The integratedcircuit communication device according to claim 72, wherein the decisionmodule deactivates the active coefficient if the active coefficient hasa value smaller than the threshold.
 74. The integrated circuitcommunication device according to claim 73, further comprising: a bufferproviding a specified error; an error computing module computing a errormetric; and a second comparing module coupled to the buffer, the errorcomputing module and the threshold module, the second comparing modulecomparing the error metric with the specified error and producing afirst control signal to the threshold module when the error metric issmaller than the specified error and a second control signal to thethreshold module when the error metric is larger than the specifiederror.
 75. The integrated circuit communication device according toclaim 74, wherein the threshold module updates the threshold uponreception of the first or second control signal.
 76. The integratedcircuit communication device according to claim 70, wherein the firstISI compensation circuit comprises: an inverse partial response filterhaving an impulse response substantially an inverse of an impulseresponse of a pulse shaping filter of a remote transmitter, so as tosubstantially compensate an input digital signal for a first ISIcomponent.
 77. The integrated circuit communication device according toclaim 76, wherein the inverse partial response filter is implementedwith a characteristic feedback gain factor K.
 78. The integrated circuitcommunication device according to claim 77, wherein the inverse partialresponse filter operates in accordance with a non-zero value of thecharacteristic feedback gain factor K during communicationinitialization and wherein the value of the feedback gain factor K isramped down to zero after a pre-defined interval.
 79. The integratedcircuit communication device according to claim 78, wherein the secondISI compensation circuit comprises: a Viterbi decoder configured todecode a digital signal and generate tentative decisions; and feedbackequalizer circuitry coupled to the Viterbi decoder, the feedbackequalizer circuitry receiving the tentative decisions and combining thetentative decisions with a set of high-ordered coefficients to generatea first value.
 80. The integrated circuit communication device accordingto claim 79, wherein the second ISI compensation circuit furthercomprises: summing circuitry combining the first value with a seconddigital signal, the summing circuitry outputting an intermediate signal;and a multiple decision feedback equalizer receiving the intermediatesignal and combining the intermediate signal with a set of pre-computedvalues generated by combining values of a set of low-orderedcoefficients with a set of values representing levels of a multi-levelsymbolic alphabet to produce a set of potential digital signals, one ofthe potential digital signals being substantially ISI compensated, themultiple decision feedback equalizer outputting said one of thepotential digital signals to the Viterbi decoder.
 81. The integratedcircuit communication device according to claim 80, wherein thecharacteristic feedback gain factor K is ramped to zero afterconvergence of the decision feedback equalizer.
 82. The integratedcircuit communication device according to claim 70, the single statedecision feedback equalizer having a set of ordered coefficients, thedecision feedback equalizer defining a coefficient related tail valueand a low order subset of coefficient values
 83. The integrated circuitcommunication device according to claim 82, wherein the single statedecision feedback equalizer has a width dimension D, wherein the widthdimension D corresponds to the number of pairs defining the multi-pairtransmission channel.
 84. The integrated circuit communication deviceaccording to claim 83, further comprising a state multiplicationcircuit, the state multiplication circuit expanding a single staterepresentation output signal received from the single state decisionfeedback equalizer into an N state representation signal suitable fordecoding by the DFSE.
 85. The integrated circuit communication deviceaccording to claim 84, the state multiplication circuit comprising: aconvolution engine coupled to combine the low order subset ofcoefficient values with each member of a set of symbolic levels todefine a first sample signal set; and a summing circuit coupled tocombine the tail value with each member of the first sample signal setto define an N state representational set of signal samples.
 86. Anintegrated circuit communication device configured for operation over amulti-pair transmission channel, the communication device comprising:measurement circuitry configured to measure a performance degradationcharacteristic resulting from disabling each member of a set ofsub-pluralities of a plurality of circuit elements; disabling circuitryconfigured to adaptively disable one or more of the sub-pluralities ofthe circuit elements until the performance degradation characteristicreaches a threshold level; a single state decision feedback equalizer; afirst ISI compensation circuit configured to compensate for atransmitter induced ISI component; a second ISI compensation circuitconfigured to compensate for a transmission channel induced ISIcomponent; and a decoder system for computing the distance of a receivedsymbolic word from a codeword.
 87. A method for reducing systemperformance degradation due to switching noise in a system, the systemcomprising a set of subsystems, each of the subsystems comprising ananalog section and a digital section, each of the analog sectionsoperating in accordance with a corresponding one of a set of samplingclock signals, the sampling clock signals being synchronous infrequency, the digital sections operating in accordance with a receiveclock signal, the method comprising the operations of: generating thereceive clock signal such that the receive clock signal is synchronousin frequency with the sampling clock signals and having a phase offsetwith respect to one of the sampling clock signals; and adjusting thephase offset such that system performance degradation due to coupling ofswitching noise from the digital sections to the analog sections issubstantially minimized.
 88. The method of claim 87 wherein, in theoperation of adjusting the phase offset, the phase offset is adjustedsuch that a time difference between a transition occurrence of thereceive clock signal and transition occurrences of sampling clocksignals, that are adjacent in time to the transition occurrence of thereceive clock signal, is substantially maximized.
 89. The method ofclaim 87 wherein the operation of adjusting the phase offset of thereceive clock comprises the operations of: (1) determining a set ofphase offset values for the phase offset; (2) computing a set of systemperformance errors corresponding one-to-one to the phase offset values;and (3) selecting one of the phase offset values, said one phase offsetvalue corresponding to a minimum of the system performance errors. 90.The method of claim 89 wherein the set of phase offset values comprises64 phase offset values.
 91. The method of claim 89 wherein operation (2)comprises the operations of: computing a subsystem performance error,corresponding to one of the phase offset values, for each of thesubsystems; combining the subsystem performance errors to generate thecorresponding system performance error.
 92. The method of claim 91wherein the operation of computing a subsystem performance error for acorresponding subsystem comprises: squaring a slicer error associatedwith the subsystem; accumulating a number of associated squared slicererrors via a filter for a period of time; and outputting an accumulatedsquared error as the subsystem performance error after the period oftime.
 93. The method of claim 87 further comprising the operation of:adjusting a sampling phase of at least one of the sampling clock signalssuch that a subsystem performance error of the subsystem whichcorresponds to said one of the sampling clock signals is substantiallyminimized.
 94. The method of claim 93 wherein the operation of adjustingthe sampling phase of at least one of the sampling clock signalscomprises the operations of: (1) determining a set of sampling phasevalues for the sampling phase; (2) computing a set of subsystemperformance errors corresponding one-to-one to the sampling phasevalues; and (3) selecting one of the sampling phase values, said onesampling phase value corresponding to a minimum of the subsystemperformance errors.
 95. The-method of claim 94 wherein the set ofsampling phase values comprises 16 sampling phase values.
 96. The methodof claim 94 wherein the operation of computing a subsystem performanceerror for the corresponding subsystem comprises: squaring a slicer errorassociated with the subsystem; accumulating a number of associatedsquared slicer errors via a filter for a period of time; and outputtingan accumulated squared error as the subsystem performance error afterthe period of time.
 97. The method of claim 87 further comprising theoperation of: adjusting a sampling phase of each of the sampling clocksignals such that a subsystem performance error of a correspondingsubsystem is substantially minimized.
 98. A method for reducing effectof switching noise in a system, the system comprising a set ofsubsystems, each of the subsystems comprising an analog section and adigital section, each of the analog sections operating in accordancewith a corresponding one of a set of sampling clock signals, the digitalsections operating in accordance with a receive clock signal, the methodcomprising the operations of: generating the sampling clock signals suchthat the sampling clock signals are synchronous in frequency with eachother; generating the receive clock signal such that the receive clocksignal is synchronous in frequency with the sampling clock signals andhaving a phase offset with respect to one of the sampling clock signals;and adjusting the phase offset such that effect of switching noise fromthe digital sections on the analog sections is substantially minimized.99. The method of claim 98 wherein, in the operation of adjusting thephase offset, the phase offset is adjusted such that time differencebetween a transition occurrence of the receive clock signal andtransition occurrences of sampling clock signals that are adjacent intime to the transition occurrence of the receive clock signal issubstantially maximized.
 100. The method of claim 98 further comprisingthe operation of: adjusting a phase of at least one of the samplingclock signals such that a subsystem performance error of the subsystemwhich corresponds to said one of the sampling clock signals issubstantially minimized.
 101. The method of claim 98 wherein theoperation of generating the sampling clock signals comprises theoperations of: (a) generating a phase error for each of the samplingclock signals from a corresponding phase detector; (b) inputting each ofthe phase errors to a corresponding loop filter; (c) generating filteredphase errors from the corresponding loop filters; (d) inputting each ofthe filtered phase errors to a corresponding oscillator; (e) generatingphase control signals from the corresponding oscillators; (f) inputtingeach of the phase control signals to a corresponding phase selector; and(g) generating the sampling clock signals from the corresponding phaseselectors.
 102. The method of claim 101 wherein the operation ofgenerating the receive clock signal comprises the operations of: (1)combining one of the phase control signals with the phase offset toproduce a phase shift value; (2) inputting the phase shift value to areceive clock phase selector; and (3) generating the receive clocksignal from the receive clock phase selector.
 103. The method of claim102 wherein the phase shift value comprises a set of phase steps andwherein operation (2) comprises the operation of inputting the phasesteps consecutively to the receive clock phase selector.
 104. The methodof claim 102 wherein the operation of adjusting the phase offset of thereceive clock comprises the operations of: (4) determining a set ofphase offset values for the phase offset; (5) computing a set of systemperformance errors corresponding one-to-one to the phase offset values;and (6) selecting one of the phase offset values, said one phase offsetvalue corresponding to a minimum of the system performance errors. 105.The method of claim 104 wherein the set of phase offset values comprises64 phase offset values.
 106. The method of claim 104 wherein operation(5) comprises the operations of: computing a subsystem performance errorfor each of the subsystems for one of the phase offset values; combiningthe subsystem performance errors to generate the corresponding systemperformance error.
 107. The method of claim 106 wherein the operation ofcomputing a subsystem performance error for a corresponding subsystemcomprises: squaring a slicer error associated with the subsystem;accumulating a number of associated squared slicer errors via a filterfor a period of time; and outputting an accumulated squared error as thesubsystem performance error after the period of time.
 108. The method ofclaim 101 wherein, in operation (a), each of the phase detectorsreceives a corresponding slicer error and a corresponding tentativedecision from a decoding system.
 109. The method of claim 108 whereinoperation (a) comprises: (1) generating a pre-cursor phase error bymultiplying the corresponding tentative decision by a delayed version ofthe corresponding slicer error; (2) generating a post-cursor phase errorby multiplying the corresponding slicer error by a delayed version ofthe corresponding tentative decision; and (3) combining the pre-cursorand post-cursor phase errors to produce the corresponding phase error.110. The method of claim 109 wherein operations (1), (2) and (3) areperformed via a lattice structure, the lattice structure comprising twodelay elements, two multipliers and an adder.
 111. The method of claim110 wherein operation (3) includes the operation of combining thepre-cursor, post-cursor phase errors and an offset input from a controlunit to produce the corresponding phase error.
 112. The method of claim101 wherein operation (c) comprises: accumulating a number ofconsecutive values of one of the phase errors via a first filter,resulting in a sum value; outputting the sum value from the firstfilter; integrating the sum value via a second filter to produce anintegral value; and combining the sum value and the integral value toproduce a filtered phase error.
 113. The method of claim 112 whereinoperation (3) includes the operation of scaling the integrated sum valueby a scale factor to produce the integral value.
 114. The method ofclaim 112 wherein operation (c) further comprises, before operation (3),the operation of multiplying the sum value by a factor different than 1when the system is operating in a different bandwidth mode.
 115. Themethod of claim 101 wherein operation (e) comprises the operation offiltering recursively the filtered phase errors to produce thecorresponding phase control signals.
 116. The method of claim 115wherein operation (e) further comprises the operation of scaling, beforefiltering recursively, the filtered phase errors by a scale factor. 117.The method of claim 101 wherein operation (g) comprises the operationsof: inputting a multi-phase input signal from a clock generator to eachof the phase selectors; and selecting at each of the phase selectors oneof the phases of the multi-phase input signal based on the phase controlsignal received from the corresponding oscillator.
 118. A method forgenerating a set of clock signals in a system, the set of clock signalscomprising a set of sampling clock signals, the system comprising a setof subsystems, each of the subsystems comprising an analog section, eachof the analog sections operating in accordance with a corresponding oneof the sampling clock signals, the method comprising the operations of:generating a phase error for each of the sampling clock signals from acorresponding phase detector; inputting each of the phase errors to acorresponding loop filter; generating filtered phase errors from thecorresponding loop filters; inputting each of the filtered phase errorsto a corresponding oscillator; generating phase control signals from thecorresponding oscillators; inputting each of the phase control signalsto a corresponding phase selector; and generating the sampling clocksignals from the corresponding phase selectors.
 119. The method of claim118 wherein the set of clock signals further comprises a receive clocksignal and wherein each of the subsystems further comprises a digitalsection, the digital sections operating in accordance with the receiveclock signal.
 120. The method of claim 119 wherein the receive clocksignal is related to one of the sampling clock signals.
 121. The methodof claim 120 further comprising the operations of: combining one of thephase control signals with a receive clock offset to produce a phaseshift value; inputting the phase shift value to a receive clock phaseselector; and generating the receive clock signal from the receive clockphase selector.
 122. The method of claim 121 wherein the phase shiftvalue comprises a set of phase steps and wherein the inputting operationcomprises the operation of inputting one phase step of the phase shiftvalue at a time to the receive clock phase selector.
 123. The method ofclaim 118 wherein the set of clock signals further comprises a transmitclock signal and wherein each of the subsystems further comprises atransmit section, the transmit sections operating in accordance with thetransmit clock signal.
 124. The method of claim 123 further comprisingthe operations of: inputting a transmit clock offset to a transmit clockphase selector; and generating the transmit clock signal from thetransmit clock phase selector.
 125. The method of claim 124 wherein thetransmit clock offset is equal to zero.
 126. The method of claim 123wherein the transmit clock signal is related to one of the samplingclock signals.
 127. The method of claim 126 further comprising theoperations of: inputting one of the phase control signals to a transmitclock phase selector; and generating the transmit clock signal from thetransmit clock phase selector.
 128. The method of claim 118 wherein eachof the phase detectors receives a corresponding slicer error and acorresponding tentative decision from a decoding system.
 129. The methodof claim 128 wherein operation (a) comprises: (1) generating apre-cursor phase error by multiplying the corresponding tentativedecision by a delayed version of the corresponding slicer error; (2)generating a post-cursor phase error by multiplying the correspondingslicer error by a delayed version of the corresponding tentativedecision; (3) combining the pre-cursor and post-cursor phase errors toproduce the corresponding phase error.
 130. The method of claim 129wherein operations (1), (2) and (3) are performed via a latticestructure, the lattice structure comprising two delay elements, twomultipliers and an adder.
 131. The method of claim 129 wherein operation(3) includes the operation of combining the pre-cursor, post-cursorphase errors and an offset input from a control unit to produce thecorresponding phase error.
 132. The method of claim 118 whereinoperation (c) comprises: (1) accumulating a number of consecutive valuesof one of the phase errors via a first filter, resulting in a sum value;(2) outputting the sum value from the first filter; (3) integrating thesum value via a second filter to produce an integral value; and (4)combining the sum value and the integral value to produce a filteredphase error.
 133. The method of claim 132 wherein operation (3) includesthe operation of scaling the integrated sum value by a scale factor toproduce the integral value.
 134. The method of claim 132 whereinoperation (c) further comprises, before operation (3), the operation ofmultiplying the sum value by a factor different than 1 when the systemis operating in a different bandwidth mode.
 135. The method of claim 118wherein operation (e) comprises the operation of filtering recursivelythe filtered phase errors to produce the corresponding phase controlsignals.
 136. The method of claim 135 wherein operation (e) furthercomprises the operation of scaling, before filtering recursively, thefiltered phase errors by a scale factor.
 137. The method of claim 118wherein operation (g) comprises the operations of: (1) inputting amulti-phase input signal from a clock generator to each of the phaseselectors; and (2) selecting at each of the phase selectors one of thephases of the multi-phase input signal based on the phase control signalreceived from the corresponding oscillator.
 138. A timing recoverysystem for generating a set of clock signals in a processing system, theset of clock signals comprising a set of sampling clock signals, theprocessing system comprising a set of processing subsystems, each of theprocessing subsystems comprising an analog section, each of the analogsections operating in accordance with a corresponding one of thesampling clock signals, the timing recovery system comprising: (a) a setof phase detectors generating phase errors for the correspondingsampling clock signals; (b) a set of loop filters coupled to thecorresponding phase detectors, the loop filters receiving thecorresponding phase errors and generating filtered phase errors; (c) aset of oscillators coupled to the corresponding loop filters, theoscillators receiving the filtered phase errors and generating phasecontrol signals; and (d) a set of phase selectors coupled to thecorresponding oscillators, the phase selectors receiving the phasecontrol signals and generating the sampling clock signals.
 139. Thetiming recovery system of claim 138 wherein the set of clock signalsfurther comprises a receive clock signal and wherein each of theprocessing subsystems further comprises a digital section, the digitalsections operating in accordance with the receive clock signal.
 140. Thetiming recovery system of claim 139 wherein the receive clock signal isrelated to one of the sampling clock signals.
 141. The timing recoverysystem of claim 140 further comprising a first adder and a receive clockphase selector, the first adder receiving one of the phase controlsignals and a receive clock offset and generating a phase shift value,the receive clock phase selector receiving the phase shift value andgenerating the receive clock signal.
 142. The timing recovery system ofclaim 141 wherein the phase shift value comprises a set of phase stepsand wherein the receive clock phase selector receives the phase shiftvalue in the form of consecutive phase steps.
 143. The timing recoverysystem of claim 138 wherein the set of clock signals further comprises atransmit clock signal and wherein each of the subsystems furthercomprises a transmit section, the transmit sections operating inaccordance with the transmit clock signal.
 144. The timing recoverysystem of claim 143 further comprising a transmit clock phase selector,the transmit clock phase selector receiving a transmit clock offset andgenerating the transmit clock signal.
 145. The timing recovery system ofclaim 144 wherein the transmit clock offset is equal to zero.
 146. Thetiming recovery system of claim 143 wherein the transmit clock signal isrelated to one of the sampling clock signals.
 147. The timing recoverysystem of claim 146 further comprising a transmit clock phase selector,the transmit clock phase selector receiving one of the phase controlsignals and generating the transmit clock signal.
 148. The timingrecovery system of claim 138 wherein each of the phase detectorsreceives a corresponding slicer error and a corresponding tentativedecision from a decoding system.
 149. The timing recovery system ofclaim 148 wherein each of the phase detectors comprises a latticestructure, the lattice structure comprising two delay elements, twomultipliers and an adder, the lattice structure generating a pre-cursorphase error by multiplying the corresponding tentative decision by adelayed version of the corresponding slicer error and generating apost-cursor phase error by multiplying the corresponding slicer error bya delayed version of the corresponding tentative decision and combiningthe pre-cursor and post-cursor phase errors to produce the correspondingphase error.
 150. The timing recovery system of claim 149 wherein atleast one of the phase detectors further receives an offset input from acontrol unit and wherein the associated lattice structure combines thepre-cursor, post-cursor phase errors and the offset input to produce thecorresponding phase error.
 151. The timing recovery system of claim 138wherein at least one of the loop filters comprises a first filter foraccumulating a number of consecutive values of one of the phase errorsto produce a filtered phase error.
 152. The timing recovery system ofclaim 138 wherein at least one of the loop filters comprises a firstfilter for accumulating a number of consecutive values of one of thephase errors to produce a sum value, a second filter for integrating thesum value to produce an integral value and an adder for combining thesum value and the integral value to produce a filtered phase error. 153.The timing recovery system of claim 152 wherein the second filterincludes a multiplier for scaling the integrated sum value by a scalefactor to produce the integral value.
 154. The timing recovery system ofclaim 152 wherein at least one of the loop filters further comprises amultiplier for multiplying the sum value by a factor different than 1when the system is operating in a different bandwidth mode.
 155. Thetiming recovery system of claim 138 wherein each of the oscillatorscomprises an infinite impulse response filter for filtering recursivelythe filtered phase errors to produce the corresponding phase controlsignals.
 156. The timing recovery system of claim 155 wherein at leastone of the oscillators further comprises a multiplier for scaling thefiltered phase errors by a scale factor and outputting the scaledfiltered phase errors to the associated impulse response filter. 157.The timing recovery system of claim 138 wherein each of the phaseselectors receives a multi-phase input signal from a clock generator andselects one of the phases of the multi-phase input signal based on thephase control signal received from the corresponding oscillator.
 158. Atiming recovery system for generating a set of clock signals in aprocessing system, the set of clock signals comprising a set of samplingclock signals, the processing system comprising a set of processingsubsystems, each of the processing subsystems comprising an analogsection, each of the analog sections operating in accordance with acorresponding one of the sampling clock signals, the timing recoverysystem comprising: (a) a set of phase detectors generating phase errorsfor the corresponding sampling clock signals; (b) a set of loop filterscoupled to the corresponding phase detectors, the loop filters receivingthe corresponding phase errors and generating filtered phase errors; (c)a set of digital-to-analog (D/A) converters coupled to the loop filters,the D/A converters receiving the filtered phase errors and generatinganalog filtered phase errors; and (d) a set of oscillators coupled tothe corresponding D/A converters, the oscillators receiving the analogfiltered phase errors and generating the sampling clock signals. 159.The timing recovery system of claim 158 wherein the oscillators comprisevaractor diodes.